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Abstract 

We give new evidence that quantum computers — moreover, rudimentary quantum computers 
built entirely out of linear-optical elements — cannot be efficiently simulated by classical comput- 
ers. In particular, we define a model of computation in which identical photons are generated, 
sent through a linear-optical network, then nonadaptively measured to count the number of 
photons in each mode. This model is not known or believed to be universal for quantum com- 
putation, and indeed, we discuss the prospects for realizing the model using current technology. 
On the other hand, we prove that the model is able to solve sampling problems and search 
problems that are classically intractable under plausible assumptions. 

Our first result says that, if there exists a polynomial-time classical algorithm that samples 
from the same probability distribution as a linear-optical network, then P*'' = BPP'^^, and 
hence the polynomial hierarchy collapses to the third level. Unfortunately, this result assumes 
an extremely accurate simulation. 

Our main result suggests that even an approximate or noisy classical simulation would al- 
ready imply a collapse of the polynomial hierarchy. For this, we need two unproven conjectures; 
the Permanent-of-Gaussians Conjecture, which says that it is ^P-hard to approximate the per- 
manent of a matrix A of independent A''(0, 1) Gaussian entries, with high probability over A; 
and the Permanent Anti- Concentration Conjecture, which says that |Per(A)| > ^/n\/po\y{n) 
with high probability over A. We present evidence for these conjectures, both of which seem 
interesting even apart from our application. 

This paper does not assume knowledge of quantum optics. Indeed, part of its goal is to 
develop the beautiful theory of noninteracting bosons underlying our model, and its connection 
to the permanent function, in a self-contained way accessible to theoretical computer scientists. 
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1 Introduction 

The Extended Church- Turing Thesis says that all computational problems that are efficiently solv- 
able by realistic physical devices, are efficiently solvable by a probabilistic Turing machine. Ever 
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since Shor's algorithm [52], we have known that this thesis is in severe tension with the currently- 
accepted laws of physics. One way to state Shor's discovery is this: 

Predicting the results of a given quantum-mechanical experiment, to finite accuracy, 
cannot be done by a classical computer in probabilistic polynomial time, unless factoring 
integers can as well. 

As the above formulation makes clear, Shor's result is not merely about some hypothetical 
future in which large-scale quantum computers are built. It is also a hardness result for a practical 
problem. For simulating quantum systems is one of the central computational problems of modern 
science, with applications from drug design to nanofabrication to nuclear physics. It has long 
been a major application of high-performance computing, and Nobel Prizes have been awarded for 
methods (such as the Density Functional Theory) to handle special cases. What Shor's result 
shows is that, if we had an efficient, general-purpose solution to the quantum simulation problem, 
then we could also break widely- used cryptosystems such as RSA. 

However, as evidence against the Extended Church- Turing Thesis, Shor's algorithm has two 
significant drawbacks. The first is that, even by the conjecture-happy standards of complexity 
theory, it is no means settled that factoring is classically hard. Yes, we believe this enough to base 
modern cryptography on it — but as far as anyone knows, factoring could be in BPP without causing 
any collapse of complexity classes or other disastrous theoretical consequences. Also, of course, 
there are subexponential-time factoring algorithms (such as the number field sieve) , and few would 
express confidence that they cannot be further improved. And thus, ever since Bernstein and 
Vazirani [9j defined the class BQP of quantumly feasible problems, it has been a dream of quantum 
computing theory to show (for example) that, if BPP = BQP, then the polynomial hierarchy would 
collapse, or some other "generic, foundational" assumption of theoretical computer science would 
fail. In this paper, we do not quite achieve that dream, but we come closer than one might have 
thought possible. 

The second, even more obvious drawback of Shor's algorithm is that implementing it scalably 
is well beyond current technology. To run Shor's algorithm, one needs to be able to perform 
arithmetic (including modular exponentiation) on a coherent superposition of integers encoded 
in binary. This does not seem much easier than building a universal quantum computerj^ In 
particular, it appears one first needs to solve the problem of fault-tolerant quantum computation, 
which is known to be possible in principle if quantum mechanics is valid jll|36], but might require 
decoherence rates that are several orders of magnitude below what is achievable today. 

Thus, one might suspect that proving a quantum system's computational power by having it 
factor integers encoded in binary is a bit like proving a dolphin's intelligence by teaching it to solve 
arithmetic problems. Yes, with heroic effort, we can probably do this, and perhaps we have good 
reasons to. However, if we just watched the dolphin in its natural habitat, then we might see it 
display equal intelligence with no special training at all. 

Following this analogy, we can ask: are there more "natural" quantum systems that already 
provide evidence against the Extended Church- Turing Thesis? Indeed, there are countless quan- 
tum systems accessible to current experiments — including high-temperature superconductors, Bose- 
Einstein condensates, and even just large nuclei and molecules — that seem intractable to simulate 

^One caveat is a result of Cleve and Watrous [13], that Shor's algorithm can be implemented using log-depth 
quantum circuits (that is, in BPP^''^''). But even here, fault-tolerance will presumably be needed, among other 
reasons because one still has polynomial latency (the log-depth circuit does not obey spatial locality constraints). 
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Figure 1: Galton's board, a simple "computer" to output samples from the binomial distribution. 



Prom MathWorld, littp://mathworld.wolfram.com/GaltonBoard.html 



on a classical computer, and largely for the reason a theoretical computer scientist would expect: 
namely, that the dimension of a quantum state increases exponentially with the number of parti- 
cles. The difficulty is that it is not clear how to interpret these systems as solving computational 
problems. For example, what is the "input" to a Bose-Einstein condensate? In other words, while 
these systems might be hard to simulate, we would not know how to justify that conclusion using 
the one formal tool (reductions) that is currently available to us. 

So perhaps the real question is this: do there exist quantum systems that are "intermediate" 
between Shor's algorithm and a Bose-Einstein condensate — in the sense that 

(1) they are significantly closer to experimental reality than universal quantum computers, but 

(2) they can be proved, under plausible complexity assumptions (the more "generic" the better), 
to be intractable to simulate classically? 

In this paper, we will argue that the answer is yes. 
1.1 Our Model 

We define and study a formal model of quantum computation with noninteracting bosons. Physi- 
cally, our model could be implemented using a linear- optical network, in which n identical photons 
pass through a collection of simple optical elements (beamsplitters and phaseshifters), and are then 
measured to determine their locations. In Section [3l we give a detailed exposition of the model 
that does not presuppose any physics knowledge. For now, though, it is helpful to imagine a rudi- 
mentary "computer" consisting of n identical balls, which are dropped one by one into a vertical 
lattice of pegs, each of which randomly scatters each incoming ball onto one of two other pegs. 
Such an arrangement — called Galton's board — is sometimes used in science museums to illustrate 
the binomial distribution (see Figure [1]) . The "input" to the computer is the exact arrangement 
A of the pegs, while the "output" is the number of balls that have landed at each location on the 
bottom (or rather, a sample from the joint distribution Va over these numbers). There is no 
interaction between pairs of balls. 

Our model is essentially the same as that shown in Figure [H except that instead of identical 
balls, we use identical bosons governed by quantum statistics. Other minor differences are that, 
in our model, the "balls" are each dropped from different starting locations, rather than a single 



4 



location; and the "pegs," rather than being arranged in a regular lattice, can be arranged arbitrarily 
to encode a problem of interest. 

Mathematically, the key point about our model is that, to find the probability of any particular 
output of the computer, one needs to calculate the permanent of an n x n matrix. This can be seen 
even in the classical case: suppose there are n balls and n final locations, and ball i has probability 
Uij of landing at location j. Then the probability of one ball landing in each of the n locations is 

n 

Per (yl) = ^ JJaia(i)' 

where A = Of course, in the classical case, the Oj/s are nonnegative real numbers — 

which means that we can approximate Per (A) in probabilistic polynomial time, by using the 
celebrated algorithm of Jerrum, Sinclair, and Vigoda [3D]. In the quantum case, by contrast, the 
Ojj's are complex numbers. And it is not hard to show that, given a general matrix A G C"^", even 
approximating Per (A) to within a constant factor is #P-complete. This fundamental difference 
between nonnegative and complex matrices is the starting point for everything we do in this paper. 

It is not hard to show that a boson computer can be simulated by a "standard" quantum 
computer (that is, in BQP). But the other direction seems extremely unlikely — indeed, it even 
seems unlikely that a boson computer can do universal classical computation! Nor do we have 
any evidence that a boson computer could factor integers, or solve any other decision or promise 
problem not in BP P. However, if we broaden the notion of a computational problem to encompass 
sampling and search problems, then the situation is quite different. 

1.2 Our Results 

In this paper we study BosonSampling: the problem of sampling, either exactly or approximately, 
from the output distribution of a boson computer. Our goal is to give evidence that this problem 
is hard for a classical computer. Our main results fall into three categories: 

(1) Hardness results for exact BosonSampling, which give an essentially complete picture of 
that case. 

(2) Hardness results for approximate BosonSampling, which depend on plausible conjectures 
about the permanents of i.i.d. Gaussian matrices. 

(3) A program aimed at understanding and proving the conjectures. 
We now discuss these in turn. 

1.2.1 The Exact Case 

Our first (easy) result, proved in Section |H says the following. 

Theorem 1 The exact BosonSampling problem is not efficiently solvable by a classical computer, 
unless P**^ = BPP'^'^ and the polynomial hierarchy collapses to the third level. 

More generally, let O be any oracle that "simulates boson computers, " in the sense that O takes 
as input a random string r (which O uses as its only source of randomness) and a description of a 
boson computer A, and returns a sample Oa (r) from the probability distribution over possible 
outputs of A. T/ien P#P C BPP^P". 
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In particular, even if the exact BosonSampling problem were solvable by a classical computer 
with an oracle for a PH problem, Theorem [T] would still imply that P"^*^ C BPP^"^ — and therefore 
that the polynomial hierarchy would collapse, by Toda's Theorem [60j. This provides evidence 
that quantum computers have capabilities outside the entire polynomial hierarchy, complementing 
the recent evidence of Aaronson [3] and Fefferman and Umans [20] . 

At least for a computer scientist, it is tempting to interpret Theorem [T] as saying that "the exact 
BosonSampling problem is ^^P-hard under BPP'^'^-reductions." Notice that this would have a 
shocking implication: that quantum computers (indeed, quantum computers of a particularly simple 
kind) could efficiently solve a ^^P-hard problem! 

There is a catch, though, arising from the fact that BosonSampling is a sampling problem 
rather than a decision problem. Namely, if O is an oracle for sampling from the boson distribution 
T^A, then Theorem [1] shows that P*"^ C BPP'^'^ — but only if the BPP'^^ machine gets to fix the 
random bits used by O. This condition is clearly met if O is a classical randomized algorithm, 
since we can always interpret a randomized algorithm as just a deterministic algorithm that takes 
a random string r as part of its input. On the other hand, the condition would not be met if we 
implemented O (for example) using the boson computer itself. In other words, our "reduction" 
from T^P-complete problems to BosonSampling makes essential use of the hypothesis that we 
have a classical BosonSampling algorithm. 

We will give two proofs of Theorem [TJ In the first proof, we consider the probability p of some 
particular basis state when a boson computer is measured. We then prove two facts: 

(1) Even approximating p to within a multiplicative constant is a ^^P-hard problem. 

(2) // we had a polynomial-time classical algorithm for exact BosonSampling, then we could 
approximate p to within a multiplicative constant in the class BPP'^^, by using a standard 
technique called universal hashing. 

Combining facts (1) and (2), we find that, if the classical BosonSampling algorithm exists, 
then P*^ = BPP'^^, and therefore the polynomial hierarchy collapses. 

Our second proof was inspired by independent work of Bremner, Jozsa, and Shepherd ^lOj. In 
this proof, we start with a result of Knill, Laflamme, and Milburn [35], which says that linear 
optics with adaptive measurements is universal for BQP. A straightforward modification of their 
construction shows that linear optics with postselected measurements is universal for PostBQP (that 
is, quantum polynomial-time with postselection on possibly exponentially-unlikely measurement 
outcomes). Furthermore, Aaronson [2] showed that PostBQP = PP. On the other hand, if a 
classical BosonSampling algorithm existed, then we will show that we could simulate postselected 
linear optics in PostBPP (that is, classical polynomial-time with postselection, also called BPPpath)- 
We would therefore get 

BPPpath = PostBPP = PostBQP = PP, 

which is known to imply a collapse of the polynomial hierarchy. 

Despite the simplicity of the above arguments, there is something conceptually striking about 
them. Namely, starting from an algorithm to simulate quantum mechanics, we get an algorithrrd 
to solve ^P-complete problems — even though solving ^P-complete problems is believed to be well 
beyond what a quantum computer itself can do! Of course, one price we pay is that we need to 

^Admittedly, a BPP^'' algorithm. 
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talk about sampling problems rather than decision problems. If we do so, though, then we get 
to base our belief in the power of quantum computers on P*^ 7^ BPP'^'^, which is a much more 
"generic" (many would say safer) assumption than Factoring^ BPP. 

As we see it, the central drawback of Theorem [T] is that it only addresses the consequences 
of a fast classical algorithm that exactly samples the boson distribution Da- One can relax this 
condition slightly: if the oracle O samples from some distribution D'^ whose probabilities are all 

multiplicatively close to those in 7)^, then we still get the conclusion that P*^ C BPP'^^ . In our 
view, though, multiplicative closeness is already too strong an assumption. At a minimum, given 
as input an error parameter e > 0, we ought to let our simulation algorithm sample from some 
distribution I?^ such that — < e (where ||-|| represents total variation distance), using 
poly (n, 1/e) time. 

Why are we so worried about this issue? One obvious reason is that noise, decoherence, photon 
losses, etc. will be unavoidable features in any real implementation of a boson computer. As a 
result, not even the boson computer itself can sample exactly from the distribution D^l So it 
seems arbitrary and unfair to require this of a classical simulation algorithm. 

A second, more technical reason to allow error is that later, we would like to show that a 
boson computer can solve classically-intractable search problems, in addition to sampling problems. 
However, while Aaronson [4] proved an extremely general connection between search problems and 
sampling problems, that connection only works for approximate sampling, not exact sampling. 

The third, most fundamental reason to allow error is that the connection we are claiming, 
between quantum computing and ^T^P-complete problems, is so counterintuitive. One's first urge 
is to dismiss this connection as an artifact of poor modeling choices. So the burden is on us to 
demonstrate the connection's robustness. 

Unfortunately, the proof of Theorem [1] fails completely when we consider approximate sampling 
algorithms. The reason is that the proof hinges on the #P-completeness of estimating a single, 
exponentially-small probability p. Thus, if a sampler "knew" which p we wanted to estimate, then 
it could adversarially choose to corrupt that p. It would still be a perfectly good approximate 
sampler, but would no longer reveal the solution to the #P-complete instance that we were trying 
to solve. 

1.2.2 The Approximate Case 

To get around the above problem, we need to argue that a boson computer can sample from 
a distribution P that "robustly" encodes the solution to a ^^P-complete problem. This means 
intuitively that, even if a sampler was badly wrong about any e fraction of the probabilities in T>, 
the remaining 1 — e fraction would still allow the #P-complete problem to be solved. 

It is well-known that there exist ^^P-complete problems with worst- case/average- case equiva- 
lence, and that one example of such a problem is the permanent, at least over finite fields. This 
is a reason for optimism that the sort of robust encoding we need might be possible. Indeed, it 
was precisely our desire to encode the "robustly #P-complete" permanent function into a quantum 
computer's amplitudes that led us to study the noninteracting-boson model in the first place. That 
this model also has great experimental interest simply came as a bonus. 

In this paper, our main technical contribution is to prove a connection between the ability of 
classical computers to solve the approximate BosonSampling problem and their ability to approx- 
imate the permanent. This connection "almost" shows that even approximate classical simulation 
of boson computers would imply a collapse of the polynomial hierarchy. There is still a gap in 
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the argument, but it has nothing to do with quantum computing. The gap is simply that it is 
not known, at present, how to extend the worst-case/average-case equivalence of the permanent 
from finite fields to suitably analogous statements over the reals or complex numbers. We will 
show that, if this gap can be bridged, then there exist search problems and approximate sampling 
problems that are solvable in polynomial time by a boson computer, but not by a BPP machine 
unless P#P = BPP^P. 

More concretely, consider the following problem, where the GPE stands for Gaussian Per- 
manent Estimation: 

Problem 2 (|GPE|^) Given as input a matrix X ~ AA(0, 1)^^" of i.i.d. Gaussians, together with 
error hounds £,5 > 0, estimate |Per to within additive error ±e • n\, with probability at least 

1 — 5 over X , in poly {n,l/e, 1/6) time. 

Then our main result is the following. 

Theorem 3 (Main Result) Let T>a be the probability distribution sampled by a boson computer 
A. Suppose there exists a classical algorithm C that takes as input a description of A as well as 
an error bound e, and that samples from a probability distribution such that \\T>'j^ — T)a\\ < £ in 
poly (1^1 ,1/e) time. Then the |GPE|^ problem is solvable in BPP'^'^. Indeed, if we treat C as a 

black box, then |GPE|^ G BPP^^'''. 

Theorem [3] is proved in Section O The key idea of the proof is to "smuggle" the |GPE|^ 
instance X that we want to solve into the probability of a random output of a boson computer 
A. That way, even if the classical sampling algorithm C is adversarial, it will not know which of 
the exponentially many probabilities in Da is the one we care about. And therefore, provided C 
correctly approximates most probabilities in Da, with high probability it will correctly approximate 
"our" probability, and will therefore allow |Per(X)|^ to be estimated in BPP'^^. 

Besides this conceptual step, the proof of Theorem [3] also contains a technical component that 
might find other applications in quantum information. This is that, if we choose an m x m unitary 
matrix U randomly according to the Haar measure, then any n x n submatrix of U will be close 
in variation distance to a matrix of i.i.d. Gaussians, provided that n < m}/^ . Indeed, the fact 
that i.i.d. Gaussian matrices naturally arise as submatrices of Haar unitaries is the reason why we 
will be so interested in Gaussian matrices in this paper, rather than Bernoulli matrices or other 
well-studied ensembles. 

In our view, Theorem [3] already shows that fast, approximate classical simulation of boson 
computers would have a surprising complexity consequence. For notice that, if X ~ (0, l)^^"' is 
a complex Gaussian matrix, then Per {X) is a sum of n! complex terms, almost all of which usually 
cancel each other out, leaving only a tiny residue exponentially smaller than n\. A priori^ there 
seems to be little reason to expect that residue to be approximable in the polynomial hierarchy, let 
alone in BPP^^. 

1.2.3 The Permanents of Gaussian Matrices 

One could go further, though, and speculate that estimating Per {X) for Gaussian X is actually 
T^P-hard. We call this the Permanent- of- Gaussians Gonjecture, or PGCH We prefer to state the 

■^The name is a pun on the well-known Unique Games Conjecture (UGC) [32], which says that a certain approxi- 
mation problem that "ought" to be NP-hard really is NP-hard. 
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PGC using a more "natural" variant of the Gaussian Permanent Estimation problem than 
|GPE|4. The more natural variant talks about estimating Per (X) itself, rather than |Per(X)| , 
and also asks for a multiplicative rather than additive approximation. 



Problem 4 (GPEx) Given as input a matrix X ~AA(0, 1)^^" of i.i.d. Gaussians, together with 
error bounds e,5 > 0, estimate Per (X) to within error ite • |Per {X)\, with probability at least 1 — 6 
over X , m poly (n, 1/e, 1/(5) time. 

Then the main complexity-theoretic challenge we offer is to prove or disprove the following: 

Conjecture 5 (Permanent-of-Gaussians Conjecture or PGC) GPEx is^P-hard. In other 
words, if O is any oracle that solves GPEx, then P*^ C BPP'^. 

2 

Of course, a question arises as to whether one can bridge the gap between the |GPE|_|_ problem 
that appears in Theorem [3l and the more "natural" GPEx problem used in Conjecture [3 We 
are able to do so assuming another conjecture, this one an extremely plausible anti-concentration 
bound for the permanents of Gaussian random matrices. 

Conjecture 6 (Permanent Anti-Concentration Conjecture) There exists a polynomial p such 
that for all n and 5 > 0, 



Pr 



|Per(X)|< 



< 5. 



p{n,l/6) 

In Section [71 we give a complicated reduction that proves the following: 

Theorem 7 Suppose the Permanent Anti- Concentration Conjecture holds. Then |GPE|^ and 
GPEx are polynomial-time equivalent. 

Figure [2] summarizes the overall structure of our hardness argument for approximate BOSON- 
Sampling. 

The rest of the body of the paper aims at a better understanding of Conjectures [5] and [6l 

First, in SectionEl we summarize the considerable evidence for the Permanent Anti-Concentration 
Conjecture. This includes numerical results; a weaker anti-concentration bound for the permanent 
recently proved by Tao and Vu [5^; another weaker bound that we prove; and the analogue of 
Conjecture [6] for the determinant. 

Next, in Section [9l we discuss the less certain state of affairs regarding the Permanent-of- 
Gaussians Conjecture. On the one hand, we extend the random self-reducibility of permanents 
over finite fields proved by Lipton [39], to show that exactly computing the permanent of most 
Gaussian matrices A ~ A/" (0, l)^^" is #P-hard. On the other hand, we also show that extending 
this result further, to show that approximating Per (A) for Gaussian A is #P-hard, will require 
going beyond Lipton's polynomial interpolation technique in a fundamental way. 

Two appendices give some additional results. First, in Appendix ll21 we present two remarkable 
algorithms due to Gurvits ^27J (with Gurvits's kind permission) for solving certain problems related 
to linear-optical networks in classical polynomial time. We also explain why these algorithms do 
not confiict with our hardness conjecture. Second, in AppendixdH we bring out a useful fact that 
was implicit in our proof of Theorem [3l but seems to deserve its own treatment. This is that, if 
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Figure 2: Summary of our hardness argument (modulo conjectures). If there exists a polynomial- 
time classical algorithm for approximate BosonSampling, then Theorem [3] says tiiat |GPE|4 G 
BPP^P. Assuming Conjecture [6|(the PACC), Theorem [7] says that this is equivalent to GPEx € 
BPP^'^. Assuming Conjecture [5](the PGC), this is in turn equivalent to P*'^ = BPP'^^, which 
collapses the polynomial hierarchy by Toda's Theorem [60j . 

we have n identical bosons scattered among m locations, with no two bosons in the same 

location, and if we apply a Haar-random m x m unitary transformation U and then measure the 
number of bosons in each location, with high probability we will still not find two bosons in the 
same location. In other words, at least asymptotically, the birthday paradox works the same way 
for identical bosons as for classical particles, in spite of bosons' well-known tendency to cluster in 
the same state. 

1.3 Experimental Implications 

An important motivation for our results is that they immediately suggest a linear-optics experiment, 
which would use simple optical elements (beamsplitters and phaseshifters) to induce a Haar-random 
m X m unitary transformation U on an input state of n photons, and would then check that the 
probabilities of various final states of the photons correspond to the permanents of n x n submatrices 
of U, as predicted by quantum mechanics. Were such an experiment successfully scaled to large 
values of n, Theorem [3] asserts that no polynomial-time classical algorithm could simulate the 
experiment even approximately, unless |GPE|^ G BPP'^'^. 

Of course, the question arises of how large n has to be before one can draw interesting conclu- 
sions. An obvious difficulty is that no finite experiment can hope to render a decisive verdict on 
the Extended Church- Turing Thesis, since the ECT is a statement about the asymptotic limit as 
n — )• oo. Indeed, this problem is actually worse for us than for (say) Shor's algorithm, since unlike 
with Factoring, we do not believe there is any NP witness for BosonSampling. In other words, 
if n is large enough that a classical computer cannot solve BosonSampling, then n is probably 
also large enough that a classical computer cannot even verify that a quantum computer is solving 
BosonSampling correctly. 

Yet while this sounds discouraging, it is not really an issue from the perspective of near-term 
experiments. For the foreseeable future, n being too large is likely to be the least of one's problems! 
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If one could implement our experiment with (say) 20 < n < 30, then certainly a classical computer 
could verify the answers — but at the same time, one would be getting direct evidence that a quantum 
computer could efficiently solve an "interestingly difficult" problem, one for which the best-known 
classical algorithms require many millions of operations. While disproving the Extended Church- 
Turing Thesis is formally impossible, such an experiment would arguably constitute the strongest 
evidence against the ECT to date. 

Section [6] goes into more detail about the physical resource requirements for our proposed 
experiment, as well as how one would interpret the results. In Section [6l we also show that the 
size and depth of the linear-optical network needed for our experiment can both be improved by 
polynomial factors over the naive bounds. Complexity theorists who are not interested in the 
"practical side" of boson computation can safely skip Section [U while experimentalists who are 
only interested the practical side can skip everything else. 

While most further discussion of experimental issues is deferred to Section [H there is one ques- 
tion we need to address now. Namely: what, if any, are the advantages of doing our experiment, as 
opposed simply to building a somewhat larger "conventional" quantum computer, able (for example) 
to factor 10-digit numbers using Shor's algorithm? While a full answer to this question will need 
to await detailed analysis by experimentalists, let us mention four aspects of BosonSampling that 
might make it attractive for quantum computing experiments. 

(1) Our proposal does not require any explicit coupling between pairs of photons. It therefore 
bypasses what has long been seen as one of the central technological obstacles to building a 
scalable quantum computer: namely, how to make arbitrary pairs of particles "talk to each 
other" (e.g., via two-qubit gates), in a manner that still preserves the particles' coherence. 
One might ask how there is any possibility of a quantum speedup, if the particles are never 
entangled. The answer is that, because of the way boson statistics work, every two identical 
photons are somewhat entangled "for free," in the sense that the amplitude for any process 
involving both photons includes contributions in which the photons swap their states. This 
"free" entanglement is the only kind that our model ever uses. 

(2) Photons traveling through linear-optical networks are known to have some of the best coher- 
ence properties of any quantum system accessible to current experiments. From a "tradi- 
tional" quantum computing standpoint, the disadvantages of photons are that they have no 
direct coupling to one another, and also that they are extremely difficult to store (they are, 
after all, traveling at the speed of light). There have been ingenious proposals for working 
around these problems, most famously the adaptive scheme of Knill, Laflamme, and Milburn 
|35j . By contrast, rather than trying to remedy photons' disadvantages as qubits, our pro- 
posal simply never uses photons as qubits at all, and thereby gets the coherence advantages 
of linear optics without having to address the disadvantages. 

(3) To implement Shor's algorithm, one needs to perform modular arithmetic on a coherent su- 
perposition of integers encoded in binary. Unfortunately, this requirement causes significant 
constant blowups, and helps to explain why the "world record" for implementations of Shor's 
algorithm is still the factoring of 15 into 3x5, ffist demonstrated in 2001 [64] • By contrast, 
because the BosonSampling problem is so close to the "native physics" of linear-optical net- 
works, an n-photon experiment corresponds directly to a problem instance of size n, which 
involves the permanents of n x n matrices. This raises the hope that, using current technol- 
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ogy, one could sample quantum-mechanically from a distribution in which the probabilities 
depended (for example) on the permanents of 10 x 10 matrices of complex numbers. 

(4) The resources that our experiment does demand — including reliable single-photon sources and 
photodetector arrays — are ones that experimentalists, for their own reasons, have devoted 
large and successful efforts to improving within the past decade. We see every reason to 
expect further improvements. 

In implementing our experiment, the central difficulty is likely to be getting a reasonably-large 
probability of an n-photon coincidence: that is, of all n photons arriving at the photodetectors at 
the same time (or rather, within a short enough time interval that interference is seen). If the 
photons arrive at different times, then they effectively become distinguishable particles, and the 
experiment no longer solves the BosonSampling problem. Of course, one solution is simply to 
repeat the experiment many times, then postselect on the n-photon coincidences. However, if the 
probability of an n-photon coincidence decreases exponentially with n, then this "solution" has 
obvious scalability problems. 

// one could scale our experiment to moderately large values of n (say, 10 or 20), without the 
probability of an n-photon coincidence falling off dramatically, then our experiment would raise 
the exciting possibility of doing an interestingly-large quantum computation without any need for 
explicit quantum error-correction. Whether or not this is feasible is the main open problem we 
leave for experimentalists. 

1.4 Related Work 

By necessity, this paper brings together many ideas from quantum computing, optical physics, and 
computational complexity. In this section, we try to survey the large relevant literature, organizing 
it into eight categories. 

Quantum computing with linear optics. There is a huge body of work, both experimental 
and theoretical, on quantum computing with linear optics. Much of that work builds on a seminal 
2001 result of Knill, Lafiamme, and Milburn [35], showing that linear optics combined with adaptive 
measurements is universal for quantum computation. It is largely because of this result that linear 
optics is considered a viable proposal for building a universal quantum computer0 

In the opposite direction, several interesting classes of linear-optics experiments have been 
proved to be efficiently simulable on a classical computer. For example, Bartlett and Sanders 
[H] showed that a linear-optics network with coherent-state inputs and possibly-adaptive Gaussian 
measurements can be simulated in classical polynomial time. (Intuitively, a coherent state — 
the output of a standard laser — is a superposition over different numbers of photons that behaves 
essentially like a classical wave, while a Gaussian measurement is a measurement that preserves this 
classical wave behavior.) Also, Gurvits \27] showed that, in any n-photon linear-optics experiment, 
the probability of measuring a particular basis state can be estimated to within ite additive error 

^An earlier proposal for building a universal optical quantum computer was to use nonlinear optics: in other 
words, explicit entangling interactions between pairs of photons. (See Nielsen and Chuang [32] for discussion.) The 
problem is that, at least at low energies, photons have no direct coupling to one another. It is therefore necessary 
to use other particles as intermediaries, which greatly increases decoherence, and negates many of the advantages of 
using photons in the first place. 
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in poly (n, 1/e) time|j He also showed that the marginal distribution over any k photon modes can 
be computed deterministically in rp^^^ time. We discuss Gurvits's results in detail in Appendix 

ED 

Our model can be seen as intermediate between the above two extremes: unlike Knill et al. 
[35] . we do not allow adaptive measurements, and as a result, our model is probably not universal 
for BQP. On the other hand, unlike Bartlett and Sanders, we do allow single-photon inputs and 
photon-number measurements; and unlike Gurvits [27], we consider the complexity of sampling 
from the joint distribution over all poly (n) photon modes. Our main result gives strong evidence 
that the resulting model cannot be simulated in classical polynomial time. On the other hand, it 
might be significantly easier to implement than a universal quantum computer. 

Intermediate models of quantum computation. By now, several interesting models of 
quantum computation have been proposed that are neither known to be universal for BQP, nor 
simulable in classical polynomial time. A few examples, besides the ones mentioned elsewhere in 
the paper, are the "one-clean-qubit" model of Knill and Laflamme [34j; the permutational quantum 
computing model of Jordan [31]; and stabilizer circuits with non-stabilizer initial states (such as 
cos I |0)-|-sin ^ |0)) and nonadaptive measurements [5]. The noninteracting-boson model is another 
addition to this list. 

The Hong-Ou-Mandel dip. In 1987, Hong, Ou, and Mandel [29] performed a now-standard 
experiment that, in essence, directly confirms that iwo-photon amplitudes correspond to 2 x 2 
permanents in the way predicted by quantum mechanics. From an experimental perspective, what 
we are asking for could be seen as a generalization of the so-called "Hong-Ou-Mandel dip" to the 
n-photon case, where n is as large as possible. Lim and Beige [38] previously proposed an n-photon 
generalization of the Hong-Ou-Mandel dip, but without the computational complexity motivation. 

Bosons and the permanent. Bosons are one of the two basic types of particle in the uni- 
verse; they include photons and the carriers of nuclear forces. It has been known since work by 
Caianiello [13] in 1953 (if not earlier) that the amplitudes for n-boson processes can be written as 
the permanents of n x n matrices. Meanwhile, Valiant [62j proved in 1979 that the permanent 
is #P-complete. Interestingly, according to Valiant (personal communication), he and others put 
these two facts together immediately, and wondered what they might mean for the computational 
complexity of simulating bosonic systems. To our knowledge, however, the first authors to dis- 
cuss this question in print were Troyansky and Tishby [6T] in 1996. Given an arbitrary matrix 
A G C"^"^, these authors showed how to construct a quantum observable with expectation value 
equal to Per [A). However, they correctly pointed out that this did not imply a polynomial-time 
quantum algorithm to calculate Per (A), since the variance of their observable was large enough 
that exponentially many samples would be needed. 

Later, Scheel [4^ explained how permanents arise as amplitudes in linear-optical networks, 
and noted that calculations involving linear-optical networks might be intractable because the 
permanent is #P-complete. 

Fermions and the determinant. Besides bosons, the other basic particles in the universe are 
fermions; these include matter particles such as quarks and electrons. Remarkably, the amplitudes 

'''While beautiful, this result is of limited use in practice — since in a typical linear-optics experiment, the probability 
p of measuring any specific basis state is so small that is a good additive estimate to p. 
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for n-fermion processes are given not by permanents but by determinants olnxn matrices. Despite 
the similarity of their definitions, it is well-known that the permanent and determinant differ 
dramatically in their computational properties; the former is ^^P-complete while the latter is in 
P. In a lecture in 2000, Wigderson called attention to this striking connection between the boson- 
fermion dichotomy of physics and the permanent-determinant dichotomy of computer science. He 
joked that, between bosons and fermions, "the bosons got the harder job." One could view this 
paper as a formalization of Wigderson's joke. 

To be fair, half the work of formalizing Wigderson's joke has already been carried out. In 
2002, Valiant [63] defined a beautiful subclass of quantum circuits called matchgate circuits, and 
showed that these circuits could be efficiently simulated classically, via a nontrivial algorithm that 
ultimately relied on computing determinantsjfl Shortly afterward, Terhal and DiVincenzo ^58j (see 
also Knill [33]) pointed out that matchgate circuits were equivalent to systems of noninteracting 
fermion^: in that sense, one could say Valiant had "rediscovered fermions"! Indeed, Valiant's 
matchgate model can be seen as the direct counterpart of the model studied in this paper, but with 
noninteracting fermions in place of noninteracting bosonsH At a very high level. Valiant's model is 
easy to simulate classically because the determinant is in P, whereas our model is hard to simulate 
because the permanent is ^^P-complete. 

Ironically, when the quantum Monte Carlo method [11] is used to approximate the ground states 
of many-body systems, the computational situation regarding bosons and fermions is reversed. 
Bosonic ground states tend to be easy to approximate because one can exploit non-negativity, 
while fermionic ground states tend to be hard to approximate because of cancellations between 
positive and negative terms, what physicists call "the sign problem." 

Quantum computing and ^P-compIete problems. Since amplitudes in quantum me- 
chanics are the sums of exponentially many complex numbers, it is natural to look for some formal 
connection between quantum computing and the class #P of counting problems. In 1993, Bern- 
stein and Vazirani |9j proved that BQP C However, this result says only that #P is an 
upper bound on the power of quantum computation, so the question arises of whether solving 
^P-complete problems is in any sense necessary for simulating quantum mechanics. 

To be clear, we do not expect that BQP = P*''; indeed, it would be a scientific revolution even 
if BQP were found to contain NP. However, already in 1999, Fenner, Green, Homer, and Pruim 
|21j noticed that, if we ask more refined questions about a quantum circuit than 

"does this circuit accept with probability greater than 1 — e or less than e, promised that 
one of those is true ?, " 

then we can quickly encounter ^P-completeness. In particular, Fenner et al. showed that de- 
ciding whether a quantum circuit accepts with nonzero or zero probability is complete for the 

®Or rather, a closely-related matrix function called the PfafEan. 

^Strictly speaking, unitary matchgate circuits are equivalent to noninteracting fermions (Valiant also studied 
matchgates that violated unitarity). 

^However, the noninteracting-boson model is somewhat more complicated to define, since one can have multiple 
bosons occupying the same state, whereas fermions are prohibited from this by the Pauli exclusion principle. This 
is why the basis states in our model are lists of nonnegative integers, whereas the basis states in Valiant's model are 
binary strings. 

®See also Rudolph [48] for a direct encoding of quantum computations by matrix permanents. 
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complexity class coC=P. Since 

p#P c NP™^-P, this means that the problem is ^^P-hard under 

nondeterministic reductions. 

Later, Aaronson [2j defined the class PostBQP, or quantum polynomial-time with postselection 
on possibly exponentially- unlikely measurement outcomes. He showed that PostBQP is equal to 
the classical class PP. Since P^"^ = P'^^, this says that quantum computers with postselection can 
already solve ^T^P-complete problems. Following [lOj, in Section 14.21 we will use the PostBQP = 
PP theorem to give an alternative proof of Theorem [H which does not require using the ^P- 
completeness of the permanent. 

Quantum speedups for sampling and search problems. Ultimately, we want a hardness 
result for simulating real quantum experiments, rather than postselected ones. To achieve that, 
a crucial step in this paper will be to switch attention from decision problems to sampling and 
search problems. The value of that step in a quantum computing context was recognized in several 
previous works. 

In 2008, Shepherd and Bremner [50] defined and studied a fascinating subclass of quantum com- 
putations, which they called "commuting" or "temporally-unstructured." Their model is probably 
not universal for BQP, and there is no known example of a decision problem solvable by their model 
that is not also in BPP. However, if we consider sampling problems or interactive protocols, then 
Shepherd and Bremner plausibly argued (without formal evidence) that their model might be hard 
to simulate classically. 

Recently, and independently of us, Bremner, Jozsa, and Shepherd [10] showed that commuting 
quantum computers can sample from probability distributions that cannot be efficiently sampled 
classically, unless PP = BPPpath and hence the polynomial hierarchy collapses to the third level. 
This is analogous to our Theorem [H except with commuting quantum computations instead of 
noninteracting-boson ones. 

Previously, in 2002, Terhal and DiVincenzo [59] showed that constant-depth quantum circuits 
can sample from probability distributions that cannot be efficiently sampled by a classical computer, 
unless BQP C AM. By using our arguments and Bremner et al.'s |10j . it is not hard to strengthen 
Terhal and DiVincenzo's conclusion, to show that exact classical simulation of their model would 
also imply PP = PostBQP = BPPpath, and hence that the polynomial hierarchy collapses. 

However, all of these results (including our Theorem[T|) have the drawback that they only address 
sampling from exactly the same distribution D as the quantum algorithm — or at least, from some 
distribution in which all the probabilities are multiplicatively close to the ideal ones. Indeed, in 
these results, everything hinges on the ^P-completeness of estimating a single, exponentially-small 
probability p. For this reason, such results might be considered "cheats": presumably not even 
the quantum device itself can sample perfectly from the ideal distribution Dl What if we allow 
"realistic noise," so that one only needs to sample from some probability distribution D' that is 
1/ poly (n)-c/ose to T> in total variation distance? Is that still a classically-intractable problem? 
This is the question we took as our starting point. 

Oracle results. We know of one previous work that addressed the hardness of sampling 
approximately from a quantum computer's output distribution. In 2010, Aaronson [3] showed 
that, relative to a random oracle A, quantum computers can sample from probability distributions 
D that are not even approximately samplable in BPP^"^ (that is, by classical computers with 
oracles for the polynomial hierarchy). Relative to a random oracle A, quantum computers can 
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also solve search problems not in BPP . The point of these results was to give the first formal 
evidence that quantum computers have "capabilities outside PH." 

For us, though, what is more relevant is a striking feature of the proofs of these results. Namely, 
they showed that, if the sampling and search problems in question were in BPP'^'^ , then (via a 
nonuniform, nondeterministic reduction) one could extract small constant-depth circuits for the 
2^-bit Majority function, thereby violating the celebrated circuit lower bounds of Hastad [541 
and others. What made this surprising was that the 2"-bit Majority function is j^P-completeP*^! 
In other words, even though there is no evidence that quantum computers can solve #P-complete 
problems, somehow we managed to prove the hardness of simulating a BQP machine by using the 
hardness o/ #P. 

Of course, a drawback of Aaronson's results [3] is that they were relative to an oracle. However, 
just like Simon's oracle algorithm [53j led shortly afterward to Shor's algorithm [52j, so too in this 
case one could hope to "reify the oracle": that is, find a real, unrelativized problem with the same 
behavior that the oracle problem illustrated more abstractly. That is what we do here. 



2 Preliminaries 

Throughout this paper, we use Q to denote M (0, 1)^^, the complex Gaussian distribution with mean 

and variance E^^g |z|^ = 1. (We often use the word "distribution" for continuous probability 

measures, as well as for discrete distributions.) We will be especially interested in ^'^^"^ the 
distribution over n x n matrices with i.i.d. Gaussian entries. 

For m > n, we use Um,n to denote the set of matrices A £ (^"^x"- whose columns are orthonormal 
vectors, and y.rn,n to denote the Haar measure over l/(m,n- So in particular, 7im,m is the Haar 
measure over the set Um,m of m x m unitary matrices. 

We use a to denote the complex conjugate of a. We denote the set {1, . . . , n} by [n] . Let v £ 

and A £ C"^". Then \\v\\ := Y^|fi|^ + • • • + \vn\'^, and ||^|| := max||^||=i Equivalently, 
ll^ll = i^max {A) is the largest singular value of A. 

We generally omit floor and ceiling signs, when it is clear that the relevant quantities can be 
rounded to integers without changing the asymptotic complexity. Likewise, we will talk about a 
polynomial-time algorithm receiving as input a matrix A £ C"^", often drawn from the Gaussian 
distribution ^"X". Here it is understood that the entries of A are rounded to p (n) bits of precision, 
for some polynomial p. In all such cases, it will be straightforward to verify that there exists a 
fixed polynomial p, such that none of the relevant calculations are affected by precision issues. 

We assume familiarity with standard computational complexity classes such as BQP (Bounded- 
Error Quantum Polynomial-Time) and PH (the Polynomial Hierarchy) We now define some 
other complexity classes that will be important in this work. 

Definition 8 (PostBPP and PostBQP) Say the algorithm A "succeeds" if its first output bit is 
measured to be 1 and 'fails" otherwise; conditioned on succeeding, say A "accepts" if its second 
output bit is measured to be 1 and "rejects" otherwise. Then PostBPP is the class of languages 
L C {0, 1}* for which there exists a probabilistic polynomial-time algorithm A such that, for all 
inputs x: 



^"Here we are abusing terminology (but only slightly) by speaking about the #P-completeness of an oracle problem. 
Also, strictly speaking we mean PP-complete — but since P'''' = P*'', the distinction is unimportant here. 
^^See the Complexity Zoo, www.complexityzoo.com, for definitions of these and other classes. 
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(i) Pii:[A{x) succeeds] > 0. 

(ii) If X £ L then Pr [A{x) accepts \ A{x) succeeds] > |. 



(in) If x ^ L then Pr (x) accepts \ A{x) succeeds] < g. 

PostBQP is defined the same way, except that A is a quantum algorithm rather than a classical 
one. 

PostBPP is easily seen to equal the complexity class BPPpath, which was defined by Han, Hemas- 
paandra, and Thierauf [28]. In particular, it follows from Han et al.'s results that MA C PostBPP 
and that P[["^ C PostBPP C BPP[["^, where P[|"^ and BPP^^^ denote P and BPP respectively with 
nonadaptive queries to an NP oracle. As for PostBQP, we have the following result of Aaron- 
son [2], which characterizes PostBQP in terms of the classical complexity class PP (Probabilistic 
Polynomial-Time) . 

Theorem 9 (Aaronson [2]) PostBQP = PP. 

It is well-known that P^'^ = P*'^ — and thus, Theorem [9] has the surprising implication that 
BQP with postselection is as powerful as an oracle for counting problems. 

Aaronson [2] also observed that, just as intermediate measurements do not affect the power of 
BQP, so intermediate postselected measurements do not affect the power of PostBQP. 

2.1 Sampling and Search Problems 

In this work, a central role is played not only by decision problems, but also by sampling and 
search problems. By a sampling problem S, we mean a collection of probability distributions 
{'Dx) xe{o 1}* ^ one for each input string x € {0,1}". Here T>x is a distribution over {0, 1}^^"\ for 
some fixed polynomial p. To "solve" S means to sample from T>x, given x as input, while to solve 
S approximately means (informally) to sample from some distribution that is 1/ poly (n)-close to 
T>x in variation distance. In this paper, we will be interested in both notions, but especially 
approximate sampling. 

We now define the classes SampP and SampBQP, consisting of those sampling problems that 
are approximately solvable by polynomial-time classical and quantum algorithms respectively. 

Definition 10 (SampP and SampBQP) SampP is the class of sampling problems S = {T>x)xe{o ly 
for which there exists a probabilistic polynomial-time algorithm A that, given (^x,0^^^^ as input]^ 
samples from a probability distribution T>'^ such that ]]T>'^ — T>x]] < e. SampBQP is defined the 
same way, except that A is a quantum algorithm rather than a classical one. 

Another class of problems that will interest us are search problems (also confusingly called 
"relation problems" or "function problems"). In a search problem, there is always at least one 
valid solution, and the problem is to find a solution: a famous example is finding a Nash equilibrium 
of a game, the problem shown to be PPAD-complete by Daskalakis et al. [17]. More formally, a 
search problem is a collection of nonempty sets {Bx)^^^q ly , one for each input x G {0,1}". 



^^Giving ^XjO^'^^y as input (where 0^''^ represents 1/e encoded in unary) is a standard trick for forcing an algo- 

rithm's running time to be polynomial in n as well as 1/e. 
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Here C {0, 1}^^"^ for some fixed polynomial p. To solve R means to output an element of B^, 
given X as input. 

We now define the complexity classes FBPP and FBQP, consisting of those search problems that 
are solvable by BPP and BQP machines respectively. 

Definition 11 (FBPP and FBQP) FBPP is the class of search problems R = {Bx)^^^^ for 

which there exists a probabilistic polynomial-time algorithm A that, given (|x,0^/^) as input, produces 
an output y such that Pr [y G B^] > 1 — e, where the probability is over A 's internal randomness. 
FBQP is defined the same way, except that A is a quantum algorithm rather than a classical one. 

Recently, and directly motivated by the present work, Aaronson [4J proved a general connection 
between sampling problems and search problems. 

Theorem 12 (Sampling/ Searching Equivalence Theorem [4]) Let S = {1^x)x^{o i}* 
approximate sampling problem. Then there exists a search problem Rs = (-Bx)a;g{o i}* ^^^^ ^■^ 
"equivalent" to S in the following two senses. 

(i) Let O be any oracle that, given as input, outputs a sample from a distribution Cx 

such that \\Cx — T^xW < ds we vary the random string r. Then Rs G FBPP'^. 

(ii) Let M be any probabilistic Turing machine that, given (x, as input, outputs an element 

Y G Bx with probability at least 1 — 5. Then S G SampP*^. 

Briefly, Theorem [12] is proved by using the notion of a "universal randomness test" from algo- 
rithmic information theory. Intuitively, given a sampling problem S, we define an "equivalent" 
search problem Rs as follows: "output a collection of strings Y = {yi, . . . ,yT) in the support of 
Vx, most of which have large probability in Dx and which also, conditioned on that, have close- 
to-maximal Kolmogorov complexity." Certainly, if we can sample from Dx, then we can solve this 
search problem as well. But the converse also holds: if a probabilistic Turing machine is solving 
the search problem Rs, it can only be doing so by sampling approximately from T>x. For otherwise, 
the strings yi,. . . ,yT would have short Turing machine descriptions, contrary to assumption. 

In particular. Theorem [T2] implies that S G SampP if and only if Rs G FBPP, S G SampBQP if 
and only if Rs G FBQP, and so on. We therefore obtain the following consequence: 

Theorem 13 (g]) SampP = SampBQP if and only i/FBPP = FBQP. 

3 The Noninteracting-Boson Model of Computation 

In this section, we develop a formal model of computation based on identical, noninteracting bosons: 
as a concrete example, a linear-optical network with single-photon inputs and nonadaptive photon- 
number measurements. This model will yield a complexity class that, as far as we know, is 
intermediate between BPP and BQP. The ideas behind the model have been the basis for optical 
physics for almost a century. To our knowledge, however, this is the first time the model has been 
presented from a theoretical computer science perspective. 

Like quantum mechanics itself, the noninteracting-boson model possesses a mathematical beauty 
that can be appreciated even independently of its physical origins. In an attempt to convey that 
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beauty, we will define the model in three ways, and also prove those ways to be equivalent. The first 
definition, in Section I3.H is directly in terms of physical devices (beamsplitters and phaseshifters) 
and the unitary transformations that they induce. This definition should be easy to understand 
for those already comfortable with quantum computing, and makes it apparent why our model can 
be simulated on a standard quantum computer. The second definition, in Section 13.21 is in terms 
of multivariate polynomials with an unusual inner product. This definition, which we learned from 
Gurvits [27j, is the nicest one mathematically, and makes it easy to prove many statements (for 
example, that the probabilities sum to 1) that would otherwise require tedious calculation. The 
third definition is in terms of permanents of n x n matrices, and is what lets us connect our model 
to the hardness of the permanent. The second and third definitions do not use any quantum 
formalism. 

Finally, Section 13.41 defines BosonSampling, the basic computational problem considered in 
this paper, as well as the complexity class Boson FP of search problems solvable using a Boson- 
Sampling oracle. It also proves the simple but important fact that BosonFP C FBQP: in other 
words, boson computers can be simulated efficiently by standard quantum computers. 



3.1 Physical Definition 

The model that we are going to define involves a quantum system of n identical photon^^ and m 
modes (intuitively, places that a photon can be in). We will usually be interested in the case where 
n < m < poly (n), though the model makes sense for arbitrary n and m0 Each computational 
basis state of this system has the form IS*) = . . . , Sm), where Sj represents the number of photons 
in the z*'^ mode (sj is also called the i^^ occupation number). Here the Sj's can be any nonnegative 
integers summing to n; in particular, the Sj's can be greater than 1. This corresponds to the fact 
that photons are bosons, and (unlike with fermions) an unlimited number of bosons can be in the 
same place at the same time. 

During a computation, photons are never created or destroyed, but are only moved from one 
mode to another. Mathematically, this means that the basis states IS") of our computer will always 
satisfy S G ^m,n, where $m,n is the set of tuples S = (si, . . . , Sm) satisfying si, . . . ,Sm > and 
si + ■ ■ ■ + Sm = n. Let M = |$m,n| be the total number of basis states; then one can easily check 
that M= ("^+^-1). 

Since this is quantum mechanics, a general state of the computer has the form 

where the a^'s are complex numbers satisfying "^se^mn \^s\'^ = 1- In other words, \ip) is a unit 
vector in the M-dimensional complex Hilbert space spanned by elements of ^m,n- Call this Hilbert 
space Hfyi,n- 

Just like in standard quantum computing, the Hilbert space Hm,n is exponentially large (as 
a function of m + n), which means that we can only hope to explore a tiny fraction of it using 
polynomial-size circuits. On the other hand, one difference from standard quantum computing is 
that Hm,n is not built up as the tensor product of smaller Hilbert spaces. 

^^For concreteness, we will often talk about photons in a linear-optical network, but the mathematics would be the 
same with any other system of identical, noninteracting bosons (for example, bosonic excitations in solid-state) . 

^*The one caveat is that our "standard initial state," which consists of one photon in each of the first n modes, is 
only defined if n < m. 
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Throughout this paper, we will assume that our computer starts in the state 



In) := |1,...,1,0, 



...,0) 



where the first n modes contain one photon each, and the remaining m — n modes are unoccupied. 
We call the standard initial state. 

We will also assume that measurement only occurs at the end of the computation, and that 
what is measured is the number of photons in each mode. In other words, a measurement of the 



But which unitary transformations can we perform on the state \ip), after the initialization and 
before the final measurement? For simplicity, let us consider the special case where there is only 
one photon; later we will generalize to n photons. In the one-photon case, the Hilbert space Hm,i 
has dimension M = m, and the computational basis states (| 1, 0, . . . , 0) , |0, 1, 0, . . . , 0) , etc.) simply 
record which mode the photon is in. Thus, a general state is just a unit vector in C": that is, a 
superposition over modes. 

In standard quantum computing, we know that any unitary transformation on n qubits can be 
decomposed as a product of gates, each of which acts nontrivially on at most two qubits, and is 
the identity on the other qubits. Likewise, in the linear optics model, any unitary transformation 
on m modes can be decomposed into a product of optical elements, each of which acts nontrivially 
on at most two modes, and is the identity on the other m — 2 modes. The two best-known optical 
elements are called phaseshifters and beamsplitters. A phaseshifter multiplies a single amplitude 
as by e*^, for some specified angle 6, and acts as the identity on the other m — 1 amplitudes. A 
beamsplitter modifies two amplitudes as and ax as follows, for some specified angle 6: 



It acts as the identity on the other m — 2 amplitudes. It is easy to see that beamsplitters and 
phaseshifters generate all optical elements (that is, all 2 x 2 unitaries). Moreover, the optical 
elements generate all m x m unitaries, as shown by the following lemma of Reck et al. |46j : 

Lemma 14 (Reck et al. [46]) Let U be any mx m unitary matrix. Then one can decompose U 
as a product U = Ut ■ ■ - Ui, where each Ut is an optical element (that is, a unitary matrix that acts 
nontrivially on at most 2 modes and as the identity on the remaining m — 2 modes). Furthermore, 
this decomposition has size T = O (m^), and can be found in time polynomial in m. 

Proof Sketch. The task is to produce U starting from the identity matrix — or equivalently, to 
produce / starting from U — by successively multiplying by block-diagonal unitary matrices, each 
of which contains a single 2x2 block and m — 2 blocks consisting of iJil To do so, we use a 
procedure similar to Gaussian elimination, which zeroes out the — m off-diagonal entries of U 
one by one. Then, once U has been reduced to a diagonal matrix, we use m phaseshifters to 
produce the identity matrix. ■ 

^^Such matrices are the generalizations of the so-called Givens rotations to the complex numbers. 
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We now come to the more interesting part: how do we describe the action of an optical element 
on multiple photons? In the case of a phaseshifter, it is relatively obvious what should happen. 
Namely, phaseshifting the i^^ mode by angle 9 should multiply the amplitude by e*^ once for each 
of the Si photons in mode i. In other words, it should effect the diagonal unitary transformation 

|si, . . . , SfYi) ^ c' * • • • , S771) • 

However, it is much less obvious how to describe the action of a beamsplitter on multiple photons. 

As it turns out, there is a natural homomorphism ip, which maps an mxm unitary transforma- 
tion U acting on a single photon to the corresponding M x M unitary transformation ip (U) acting 
on n photons. Since (f is a homomorphism, Lemma [H] implies that we can specify f merely by 
describing its behavior on 2 x 2 unitaries. For given an arbitrary mx m unitary matrix U, we can 
write (p{U) as 

^{Ut---Ui) = ^{Ut)---v{Ui), 

where each Ut is an optical element (that is, a block-diagonal unitary that acts nontrivially on at 
most 2 modes). So let 




be any 2x2 unitary matrix, which acts on the Hilbert space i?2,i spanned by |1, 0) and |0, 1). Then 
since ip (U) preserves photon number, we know it must be a block-diagonal matrix that satisfies 

{s,t\ip{U) \u,v) = 

whenever s-\-t 7^ u + v. But what about when s + t = u + v? Here the formula for the appropriate 
entry oi ip{U) is 

k+e=u, k<s, f<t ^ ^ ^ ^ 

One can verify by calculation that ip {U) is unitary; however, a much more elegant proof of unitarity 
will follow from the results in Section [3.21 

One more piece of notation: let Vu be the probability distribution over S G ^m,n obtained by 
measuring the state 99 {U) in the computational basis. That is, 

Pr [S] = \{U^{U)\S)\\ 

Notice that Vu depends only on the first n columns of U . Therefore, instead of writing Vu it will 
be better to write Va, where A G Um,n is the m x n matrix corresponding to the first n columns 
off/. 

3.2 Polynomial Definition 

In this section, we present a beautiful alternative interpretation of the noninteracting-boson model, 
in which the "states" are multivariate polynomials, the "operations" are unitary changes of variable, 
and a "measurement" samples from a probability distribution over monomials weighted by their 
coefficients. We also prove that this model is well-defined (i.e. that in any measurement, the 
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probabilities of the various outcomes sum to 1), and that it is indeed equivalent to the model from 
Section 13.11 Combining these facts yields the simplest proof we know that the model from Section 
13. H is well-defined. 

Let m > n. Then the "state" of our computer, at any time, will be represented by a multi- 
variate complex-valued polynomial p {xi, . . . .Xm) of degree n. Here the Xj's can be thought of as 
just formal variables!^ The standard initial state corresponds to the degree-n polynomial 
Jm,n (xi, ■ ■ ■ , Xm) xi ■ ■ ■ Xn, where Xl, . . . ,Xn are the first n variables. To transform the state, 
we can apply any m x m unitary transformation U we like to the vector of Xj's: 



/ ^1 ^ 




/ un ■ 













) 









The new state of our computer is then equal to 

n 

U [Jm.n] (xi, . . . 'Xm) — Jm.n (xi, . . . -Xm) — i'^ilXl + ■ ■ ■ ~l~ UimXm) 



i=l 



Here and throughout, we let L [p] be the polynomial obtained by starting with p and then applying 
the m X m linear transformation L to the variables. 

After applying one or more unitary transformations to the Xj's, we then get a single opportunity 
to measure the computer's state. Let the polynomial p at the time of measurement be 



p{xi,. . . .Xm) = ^ asxl^ 

S={si,...,Sm) 



Xrr, 



where S ranges over ^m,n (i-e., lists of nonnegative integers such that si + ■ ■ ■ + Sm = n). Then the 
measurement returns the monomial • • • (or equivalently, the list of integers 5 = (si, . . . , Sm)) 
with probability equal to 

Pr [S] := \as\^ si! • • • s^!. 
From now on, we will use x as shorthand for xi, . . . .Xm, and x^ as shorthand for the monomial 



Given two polynomials 



p{x)= ^ asx'^, 



s 



<l{x) = ^ bsx 

we can define an inner product between them — the so-called Fock-space inner product — as follows: 



{p, q) :-- 



E 



asbss} 



I ... .s^ I 



S'=(si,...,Sm)e<I'm,n 

The following key result gives a more intuitive interpretation of the Fock-space inner product. 



^For physicists, they are "creation operators.' 
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Lemma 15 (Interpretation of Fock Inner Product) {p,q) = E^^gm \p[x) q{x)], where Q is 
the Gaussian distribution M (0,1) 

Proof. Since inner product and expectation are linear, it suffices to consider the case where p and q 
are monomials. Suppose p (x) = x^ and q (x) = x^ , for some R = {ri, . . . , rm) and S = (si, . . . , Sm) 
in $m,n- Then 

E \p (x) q {x)\ = E \x^x^] . 

li p ^ q — that is, if there exists an i such that rj ^ Si — then the above expectation is clearly 0, 
since the Gaussian distribution is uniform over phases. If p = on the other hand, then the 
expectation equals 



E 



I |2si I |2s„ 

(y> _, ... T" 

•^1 \^m\ 



E 



Si' • • • s ' 



1 2s, 



E 

Xm^g 



I 25^1 



asbssil • • • Sm! 



We conclude that 

E [p(x)g(x)] = 
xr^g"^ 

5=(si,...,Sm)e$m,n 

as desired. ■ 

Recall that U [p] denotes the polynomial p (Ux), obtained by applying the m x m linear trans- 
formation U to the variables x = (xi, . . . ,Xm) of p. Then Lemma [T5] has the following important 
consequence. 

Theorem 16 (Unitary Invariance of Fock Inner Product) {p, q) = {U [p] , U [q]) for all poly- 
nomials p, q and all unitary transformations U. 

Proof. We have 



{U[p],U[q])= E U\p]{x)U[q]ix) 
xr^g"^ L 

= E [p{Ux)q{Ux)] 
x^g"^ 

= E [p(x)^(x)] 

x^g"^ 

= {p, q) , 

where the third line follows from the rotational invariance of the Gaussian distribution. ■ 
Indeed, we have a more general result: 

Theorem 17 {p, L [q]) = (^L^ [p] , for all polynomials p, q and all linear transformations L. (So 
in particular, if L is invertible, then {p,q) = {L^^ [p] ,L [q]).) 

Proof. Let p{x) = Ese-J-^.n "^^a;"^ ^'^'^ ^ (^) = E5e<i.„,„ ^s^;^. First suppose L is a diagonal 
matrix, i.e. L = diag (A) for some A = (Ai, . . . , Am)- Then 

{p,L[q])= Yl as{bsX'')sil---sJ. 

S={si,...,Sm)e^m,n 



E 

S={si,...,Sm)&^m,n 

[p] , 



asX^ 1 bssil • • • Sm! 
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Now note that we can decompose an arbitrary L as UKV, where A is diagonal and U, V are unitary. 
So 

{p,L[q\) = {p,UKV [q]) 

Vt[p],Ay [q] 

'K'^U^[p\,V[q] 
'LHp],q 



I . . . s I 



where the second and fourth hues fohow from Theorem [TBI ■ 
We can also define a Fock-space norm as follows: 

\\pfFock = (P^P) = Yl 

S={si,...,Sm) 

1 1 2 

Clearly HpHpock — *-* P- ^^^^ have the following: 

Corollary 18 || ^ [>/m,n] Upock ~ ^ f'^^ unitary matrices U. 
Proof. By Theorem [TUl 

■ 

Corollary [18] implies, in particular, that our model of computation based on multivariate poly- 
nomials is well-defined: that is, the probabilities of the various measurement outcomes always sum 
to 1 1 [</m,n] 1 1 Pock ~ '^^^ show that the polynomial-based model of this section is equiv- 

alent to the linear-optics model of Section I3.1[ As an immediate consequence, this implies that 
probabilities sum to 1 in the linear-optics model as well. 

Given any pure state 



in Hm,n, let P|^) be the multivariate polynomial defined by 

r> f \ \^ ^^^^ 

In particular, for any computational basis state \S), we have 

P\s) (x) 



V^i! • • • Sm! 
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Theorem 19 (Equivalence of Physical and Polynomial Definitions) < — > P\^) defines 
an isomorphism between quantum states and polynomials, which commutes with inner products and 
unitary transformations in the following senses: 



p. 



U[P\ 



\i>)\ 



Proof. That {'4'\<t)) = (i-]^^, i-]^^) follows immediately from the definitions of i-]^^ and the Fock- 
space inner product. For Pip(jj)\ip) = U [P^], notice that 



Ml 



U 



E 

S={si,...,Sm)&^m,n 



asx" 



E 

S'=(si,...,Sm)e4>„ 



as 



Vsil • • • 



Y[ (uilXl H h UimXmY" 



i=l 



So in particular, transforming P^^^ to U [P\ip)] simply effects a linear transformation on the co- 
efficients on P\i(,)- This means that there must be some M x M linear transformation (p{U), 
depending on U, such that U [P\ip)] = Pip{u)\ip)- Thus, in defining the homomorphism U — )• ip (U) 
in equation ([T]), we simply chose it to yield that linear transformation. This can be checked by 
explicit computation. By Lemma [T^ we can restrict attention to a 2 x 2 unitary matrix 



U 



a b 
c d 



By linearity, we can also restrict attention to the action of (p {U) on a computational basis state 
\s,t) (or in the polynomial formalism, the action of C/ on a monomial x^y*). Then 

U [x^y^] = {ax + byf {cx + dy)^ 



kJ \£ 



EE 

k=0 1=0 

E E 

u+v=s+t k+e=u, k<s, e<t 



a'^V c d^ 



kj \e 



aH'-'^cU^-^x'^y''. 



Thus, inserting normalization. 



U 



^Syt 

7^ 



E 

u+v=s+t 



< U\V\ 



E 



k+t=u, k<s, e<t 



a^b'-'^c^d^-^ 



which yields precisely the definition of {U) from equation ([T|). ■ 
As promised in Section [3Tl we can also show that 99 {U) is unitary. 



Corollary 20 p{U) is unitary. 
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Proof. One definition of a unitary matrix is that it preserves inner products. Let us check that 
this is the case for ip [U). For all [/, we have 



{m = {p\^).p\<t>)) 

= {U [P|^)],C/[ii^)]> 
= {iP\v{U)^^{U)\<P) 

where the second line follows from Theorem 1161 a-nd all other lines from Theorem [191 ■ 
3.3 Permanent Definition 

This section gives a third interpretation of the noninteracting-boson model, which makes clear its 
connection to the permanent. Given an n x n matrix A = {aij) G C"^", recall that the permanent 
is 

n 

Per (A) = n^i.-W- 

Also, given an m x m matrix V , let Vn^n be the top-left nx n submatrix of V. Then the following 
lemma establishes a direct connection between Per (Vn^n) and the Fock-space inner product defined 
in Section 13. 2[ 

Lemma 21 Per (Vn^n) = {Jm.n, ^ [Jm,n]) for any mxm matrix V . 
Proof. By definition, 

n 

V [Jm,n\ = JJ {mXl H h VimXm) ■ 

i=l 

Then {Jm,n,V [Jm,n]) is just the coefficient of Jm,n = Xf-Xn in the above polynomial. This 
coefficient can be calculated as 



■ 

Combining Lemma [2T] with Theorem 1171 we immediately obtain the following: 
Corollary 22 Per ((WVF)^ ^) = {V [J„,„] , W [Jm,n]) for any two matrices V,W e C"^x™. 
Proof. 

Per (^(ytp^)^ J = (jm,n,^^H^[J™,n]) = {V [J„,,n] ,W [J^,n]) ■ 

m 

Now let U be any mxm unitary matrix, and let S = (si, . . . , Sm) and T = (ti, . . . , tm) be any 
two computational basis states (that is, elements of ^m,n)- Then we define an n x n matrix Us^t 
in the following manner. First form an m x n matrix Ut by taking tj copies of the j*'^ column of 
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U, for each j G [m]. Then form the n x n matrix Us,t by takmg Sj copies of the i*'* row of Ut, for 
each i G [m]. As an example, suppose 



Note that if the Sj's and tj's are all or 1, then Us,t is simply an n x n submatrix of C/. If some 
Sj's or tj's are greater than 1, then Us,t is like a submatrix of C/, but with repeated rows and/or 
columns. 

Here is an alternative way to define Us^t- Given any S S ^m,n, let Is be a linear substitution 
of variables, which maps the variables xi, . . . , to xi, the variables x^^+i, . . . , x^^+sj to X2, and 
so on, so that Is [xi • • • x„] = x^^ • • • xf^. (If i > n, then Is [xj] = 0.) Then one can check that 



(Note also that ip{Is) |ln) = \S).) 

Theorem 23 (Equivalence of All Three Definitions) For all m x m unitaries U and basis 
states S,T £ ^m,n, 




and 5' = T= (0,1,2). Then 





Per {Us,t) = { 



x^, U [x^] > = {S\v (U) \T) ^sil---smlti\---tj. 



Proof. For the first equality, from Corollary [22] we have 



X^, U [X^] > = {Is [Jrn,n] , U It [Jm,n]) 




Per {Us,t) . 



For the second equality, from Theorem [19] we have 



{S\^{U)\T) = {P\s),P^(u)\T)) 




• • • SYn}t\ \ • • • t^n} 
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3.4 Bosonic Complexity Theory 

Having presented the noninteracting-boson model from three perspectives, we are finahy ready to 
define BosonSampling, the central computational problem considered in this work. The input 
to the problem will be Given A, together 

with a basis state 5 G ^m,n — that is, a list S = {si, . . . , s^) of nonnegative integers, satisfying 
si + ■ ■ ■ + Sm = n — let As be the n x n matrix obtained by taking Si copies of the i^^ row of A, for 
all i G [m]. Then let "Da be the probability distribution over $m,n defined as follows: 



(Theorem 1231 implies that Da is indeed a probability distribution, for every A G Um,n-) The goal 
of BosonSampling is to sample either exactly or approximately from P^, given A as input. 

Of course, we also could have defined Va as the distribution over $m,n obtained by first com- 
pleting A to any m x m unitary matrix U, then measuring the quantum state (p (U) in the 
computational basis. Or we could have defined Va as the distribution obtained by first applying 
the linear change of variables U to the polynomial xi • • • x„ (where again U is any m x ni unitary 
completion of A), to obtain a new m- variable polynomial 

U[xi---Xn\= ^ asx^, 



5e<i>„ 



and then letting 



Pr [S] = \as\'^ si\ ■ ■ ■ Sm^. 



\{x^, U [xi 



For most of the paper, though, we will find it most convenient to use the definition of Va in terms 
of permanents. 

Besides the BosonSampling problem, we will also need the concept of an exact or approximate 
BosonSampling oracle. Intuitively, a BosonSampling oracle is simply an oracle O that solves 
the BosonSampling problem: that is, O takes as input a matrix A G Um,n, and outputs a 
sample from Va- However, there is a subtlety, arising from the fact that O is an oracle for a 
sampling problem. Namely, it is essential that Cs only source of random randomness be a string 
r G {0, l}P°iy(") that is also given to O as input. In other words, if we fix r, then O {A,r) must 
be deterministic, just like a conventional oracle that decides a language. Of course, if O were 
implemented by a classical algorithm, this requirement would be trivial to satisfy. 

More formally: 

Definition 24 (BosonSampling oracle) Let O he an oracle that takes as input a string r G 
{0,1}^°^^^"^ an m X n matrix A G Um.n, o,n-d an error bound e > encoded as 0"*^/^. Also, let 
Vq [A, e) he the distribution over outputs of O if A and e are fixed but r is uniformly random. We 
call O an exact BosonSampling oracle ifVa {A,e) = Va for all A G Um,n- Also, we call O an 
approximate BosonSampling oracle if \\Vo {A,e) - Va\\ < e for all A G Um,n and e > 0. 



^''Here we assume each entry of A is represented in binary, so that it has the form (x + yi) /2''^"-' , where x and y are 
integers and p is some fixed polynomial. As a consequence, A might not be exactly column-orthonormal — but as long 
as A'^ A is exponentially close to the identity, A can easily be "corrected" to an element oi'Um,n using Gram-Schmidt 
orthogonalization. Furthermore, it is not hard to show that every element of Um,n can be approximated in this 
manner. See for example Aaronson T for a detailed error analysis. 
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If we like, we can define the complexity class Boson FP, to be the set of search problems 
R = (-Ba;)^g|Q that are in FBPP*"' for every exact BosonSampling oracle O. We can also 

define Boson FPg to be the set of search problems that are in FBPP^ for every approximate Boson- 
Sampling oracle O. We then have the following basic inclusions: 

Theorem 25 FBPP C BosonFPe = BosonFP C FBQP. 

Proof. For FBPP C BosonFP^, just ignore the BosonSampling oracle. For BosonFP^ C BosonFP, 

note that any exact BosonSampling oracle is also an e-approximate one for every e. For the 
other direction, BosonFP C BosonFPe, let M be a BosonFP machine, and let O be M's exact 
BosonSampling oracle. Since M has to work for every O, we can assume without loss of generality 
that O is chosen uniformly at random, consistent with the requirement that T>o (^) = T^A for every 
A. We claim that we can simulate O to sufficient accuracy using an approximate BosonSampling 
oracle. To do so, we simply choose e <^ S/p{n), where p{n) is an upper bound on the number of 
queries to O made by M, and 6 is the desired failure probability of M. 

For BosonFP C FBQP, we use an old observation of Feynman [22] and Abrams and Lloyd [6]: 
that fermionic and bosonic systems can be simulated efficiently on a standard quantum computer. 
In more detail, our quantum computer's state at any time step will have the form 

(si,...,Sm)G<I>m,n 

That is, we simply encode each occupation number < Sj < n in binary using [log2 n] qubits. 
(Thus, the total number of qubits in our simulation is m [log2 n] .) To initialize, we prepare 
the state = |1, . . . , 1, 0, . . . , 0); to measure, we measure in the computational basis. As for 
simulating an optical element: recall that such an element acts nontrivially only on two modes i 
and j, and hence on 2 [log2 n] qubits. So we can describe an optical element by an O (n^) x O (n^) 
unitary matrix U — and furthermore, we gave an explicit formula ([I]) for the entries of U. It follows 
immediately, from the Solovay-Kitaev Theorem (see [32]), that we can simulate U with error e, 
using poly (n, log 1/e) qubit gates. Therefore an FBQP machine can simulate each call that a 
BosonFP machine makes to the BosonSampling oracle. ■ 

4 Efficient Classical Simulation of Linear Optics Collapses PH 

In this section we prove Theorem [H our hardness result for exact BosonSampling. First, in 
Section 14.11 we prove that P*^ C BPP'^^ , where O is any exact BosonSampling oracle. In 
particular, this implies that, if there exists a polynomial-time classical algorithm for exact Boson- 
Sampling, then P'^'^ = BPP'^^ and hence the polynomial hierarchy collapses to the third level. The 
proof in Section 14.11 directly exploits the fact that boson amplitudes are given by the permanents 
of complex matrices X G C"^", and that approximating Per (X) given such an X is #P-complete. 
The main lemma we need to prove is simply that approximating |Per(X)|^ is also #P-complete. 
Next, in Section [4.21 we give a completely different proof of Theorem [TJ This proof repurposes 
two existing results in quantum computation: the scheme for universal quantum computing with 
adaptive linear optics due to Knill, Laflamme, and Milburn [35], and the PostBQP = PP theorem 
of Aaronson [2j. Finally, in Section \4.3\ we observe two improvements to the basic result. 
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4.1 Basic Result 



First, we will need a classic result of Stockmeyer |55| . 

Theorem 26 (Stockmeyer [55j) Given a Boolean function f : {0, 1}" — )• {0, 1}, let 

p = Pr [f(x) = l] = — y / (x) . 

x6{0,l} 

T/ien /or all g > 1 + ^^^y^^, i/iere exists an FBPP'^^''^ machine that approximates p to within a 
multiplicative factor of g. 

Intuitively, Theorem says that a BPP'^^ machine can always estimate the probability p that 
a polynomial-time randomized algorithm accepts to within a 1/ poly (n) multiplicative factor, even 
if p is exponentially small. Note that Theorem [26] does not generalize to estimating the probability 
that a quantum algorithm accepts, since the randomness is "built in" to a quantum algorithm, and 
the BPP'^^ machine does not get to choose or control it. 

Another interpretation of Theorem |26| is that any counting problem that involves estimating 
the sum of 2" nonnegative real numbera^H can be approximately solved in BPP'^'^. 

By contrast, if a counting problem involves estimating a sum of both positive and negative num- 
bers — for example, if one wanted to approximate Ea,g|o^i}" [/ (x)], for some function / : {0, 1}" — )• 
{ — 1,1} — then the situation is completely different. In that case, it is easy to show that even 
multiplicative approximation is ^^P-hard, and hence unlikely to be in FBPP'^'^. 

We will show this phenomenon in the special case of the permanent. If X is a non-negative 
matrix, then Jerrum, Sinclair, and Vigoda [30] famously showed that one can approximate Per (X) 
to within multiplicative error e in poly (n, 1 /e) time (which improves on Theorem [26| by getting 
rid of the NP oracle). On the other hand, let X £ M"^" be an arbitrary real matrix, with both 
positive and negative entries. Then we will show that multiplicatively approximating Per (X)'^ = 
|Per (X)r is #P-hard. The reason why we are interested in |Per {X)\ , rather than Per (X) itself, 
is that measurement probabilities in the noninteracting-boson model are the absolute squares of 
permanents. 

Our starting point is a famous result of Valiant [62j : 

Theorem 27 (Valiant [62j) The following problem is ^P- complete: given a matrix X £ {0,1}""^", 
compute Per (X). 

We now show that Per (X)^ is #P-hard to approximate. 

Theorem 28 (Hardness of Approximating Per (X)'^) The following problem is ^P-hard, for 
any g G [1, poly (n)].- given a real matrix X £ M"^", approximate Per (X)'^ to within a multiplicative 
factor of g. 

Proof. Let O be an oracle that, given a matrix M G M"^", outputs a nonnegative real number 
O (M) such that 

?^^<OiM)<gPeriMf. 



^^Strictly speaking, Theorem 1261 talks about estimating the sum of 2" binary ({0, l}-valued) numbers, but it is 
easy to generahze to arbitrary nonnegative reals. 



30 



Also, let X = {xij) G {0, 1}" " be an input matrix, which we assume for simplicity consists only 
of Os and Is. Then we will show how to compute Per {X) exactly, in polynomial time and using 
0((7n^logn) adaptive queries to O. Since Per (X) is #P-complete by Theorem 1271 this will 
immediately imply the lemma. 

Since X is non-negative, we can check in polynomial time whether Per [X) = 0. If Per {X) = 
we are done, so assume Per {X) > 1. Then there exists a permutation a such that a;i,(T(i) = • • • = 
Xn,a(n) = 1- By permuting the rows and columns, we can assume without loss of generality that 

Xii — • • • — Xnn — !• 

Our reduction will use recursion on n. Let Y = (yij) be the bottom-right (n — 1) x (n — 1) 
submatrix of X. Then we will assume inductively that we already know Per (y). We will use 
that knowledge, together with O (gnlogn) queries to O, to find Per (X). 

Given a real number r, let G M"^" be a matrix identical to X, except that the top-left 
entry is xn — r instead of Xn. Then it is not hard to see that 

Per fxM) = Per (X) - r Per {Y) . 



Note that yu = ■■■ = y(^n~i),{n-i) = li so Per (y) > 1. Hence there must be a unique value 
r = r* such that Per (X^'"*!) = 0. Furthermore, if we can find that r* , then we are done, since 
Per {X) = r* Per (Y). 
To find 

Per^ 
Per (Y) ' 

we will use a procedure based on binary search. Let r (0) := be our "initial guess"; then we will 
repeatedly improve this guess to r (1), r (2), etc. The invariant we want to maintain is that 

o (xi'<'«)i) < °<-^f") 

for all t. 

To find r (t + 1) starting from r (t): first observe that 

\r(t) ,*\_ \rit)PeriY)-Fer{X)\ 
' ' " Per (Y) 



I Per {X^'- 



< 



Per (y) 

^g-o{xim) 



Per (y) 

where the last line follows from Per (M)^ /g ^ O (M). So setting 



^9 ■ O jXirm) 
^ Per (y) ' 

we find that r* is somewhere in the interval / := [r (t) — (3,r{t) + /?]. Divide I into L equal 
segments (for some L to be determined later), and let s (1) , . . . , s (L) be their left endpoints. Then 
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the procedure is to evaluate O (Xl*^*)!) for each i S [L], and set r (t + 1) equal to the s (i) for which 
O is minimized (breaking ties arbitrarily). 

Clearly there exists an i S [L] such that \s (i) — r*\ < /3/L — and for that particular choice of i, 
we have 



= 5 (Per {X)-s{i)Per {Y)f 

= g (Per {X) - {s (i) - r*) Per (Y) - r* Per {Y)f 

= g {s (i) - r*f Fer {Yf 

<g^FeT{Yf 

^2 



Therefore, so long as we choose L > ^/2g, we find that 



2 



which is what we wanted. 
Now observe that 

O =0{X)<g Per {Xf < g (n!)^ . 

So for some T = O (n log n), 



4ff 

By equation ([2]), this in turn implies that 



Jg.O{X[riT)]) 



1 



Per(y) 2Per(y)' 

But this means that we can find r* exactly, since r* equals a rational number p^|^|y| , where Per {X) 
and Per {Y) are both positive integers and Per {Y) is known. ■ 

Let us remark that one can improve Theorem [28l to ensure that the entries of X are all at 
most poly (n) in absolute value. We do not pursue that here, since it will not be needed for our 
application. 

Lemma 29 Let X £ C"^". Then for all m > 2n and e < 1/ ||^||, there exists an m x m unitary 
matrix U that contains eX as a submatrix. Furthermore, U can be computed in polynomial time 
given X. 

Proof. Let Y = eX. Then it suffices to show how to construct a 2n x n matrix W whose columns 
are orthonormal vectors, and that contains Y as its top nx n submatrix. For such a W can easily 
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be completed to an m x n matrix whose columns are orthonormal (by filling the bottom m — 2n 
rows with zeroes), which can in turn be completed to an m x m unitary matrix in O (m^) time. 

Since ||y|| < e \\X\\ < 1, we have Y^Y ^ I in the semidefinite ordering. Hence / — Y^Y 
is positive semidefinite. So / — Y^Y has a Cholesky decomposition / — Y'^Y = Z"^ Z, for some 

Z G C"^". Let us set W := . Then W'^W = Y^^Y + Z'^Z = /, so the columns of W are 

orthonormal as desired. ■ 

We are now ready to prove Theorem [1} that P*^ C BPP'^^ for any exact BosonSampling 
oracle O. 

Proof of Theorem [H Given a matrix X G M"^" and a parameter g G 1 + poiy(n) ' P°^y (^) ; 
we know from Theorem [28] that it is #P-hard to approximate Per(X)^ to within a multiplicative 
factor of g. So to prove the theorem, it suffices to show how to approximate Per (X)^ in FBPP'^'^ . 

Set m := 2n and e := 1/ \\X\\ > 2-P°iy("). Then by Lemma[29l we can efficiently construct 
an m X m unitary matrix U with Un,n = ^X as its top-left n x n submatrix. Let A be the m x n 
column-orthonormal matrix corresponding to the first n columns of U. Let us feed A as input to 
O, and consider the probability pA that O outputs 1„. We have 

PA = Pr [O iA,r) = In] 

r 

= \{ln\v^iU)\ln)\^ 
= \FeT{Un,n)f 

= £2" I Per 

where the third line follows from Theorem[23l But by Theorem[26l we can approximate pA to within 
a multiplicative factor of g in FBPP'^^ . It follows that we can approximate |Per = Per (X)'^ 

in FBPP^P" as weh. ■ 

The main fact that we wanted to prove is an immediate corollary of Theorem [H 

Corollary 30 Suppose exact BosonSampling can be done in classical polynomial time. Then 
P*"^ = BPP'^'^, and hence the polynomial hierarchy collapses to the third level. 

Proof. Combining the assumption with Theorem [H we get that P*^ C BPP'^^, which by Toda's 
Theorem [60] implies that P*^ = PH = = BPP^^. ■ 

Likewise, even if exact BosonSampling can be done in BPP'^^ (that is, using an oracle for 
some fixed level of the polynomial hierarchy), we still get that 

p#P c BPP^P'' = BPPPH = PH, 

and hence PH collapses. 

As another application of Theorem[Tl suppose exact BosonSampling can be done in BPP^''°'^'^^^Q^: 
that is, using an oracle for BQP decision problems. Then we get the containment 

p#P c BPP^P 

Such a containment seems unlikely (though we admit to lacking a strong intuition here), thereby 
providing possible evidence for a separation between BQP sampling problems and BQP decision 
problems. 
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4.2 Alternate Proof Using KLM 

Inspired by recent work of Bremner et al. [lOJ, in this section we give a different proof of Theorem [TJ 
This proof makes no use of permanents or approximate counting; instead, it invokes two previous 
quantum computing results — the KLM Theorem [35] and the PostBQP = PP theorem [2j — as 
black boxes. Compared to the first proof, the second one has the advantage of being shorter 
and completely free of calculations; also, it easily generalizes to many other quantum computing 
models, besides noninteracting bosons. The disadvantage is that, to those unfamiliar with [351 [2], 
the second proof gives less intuition about why Theorem [T] is true. Also, we do not know how to 
generalize the second proof to say anything about the hardness of approximate sampling. For that, 
it seems essential to talk about the Permanent or some other concrete ^^P-complete problem. 

Our starting point is the KLM Theorem, which says informally that linear optics augmented 
with adaptive measurements is universal for quantum computation. A bit more formally, define 
BosonPadap to be the class of languages that are decidable in BPP (that is, classical probabilistic 
polynomial-time), augmented with the ability to prepare /c-photon states (for any k = poly(n)) 
in any of m = poly (n) modes; apply arbitrary optical elements to pairs of modes; measure the 
photon number of any mode at any time; and condition future optical elements and classical 
computations on the outcomes of the measurements. From Theorem 125^ it is not hard to see that 
BosonPadap ^ BQP. The amazing discovery of Knill et al. [35] was that the other direction holds 
as well: 

Theorem 31 (KLM Theorem [35]) BosonPadap = BQP. 

In the proof of Theorem [311 a key step is to consider a model of linear optics with postselected 
measurements. This is similar to the model with adaptive measurements described above, except 
that here we guess the outcomes of all the photon-number measurements at the very beginning, 
and then only proceed with the computation if the guesses turn out to be correct. In general, 
the resulting computation will only succeed with exponentially-small probability, but we know 
when it does succeed. Notice that, in this model, there is never any need to condition later 
computational steps on the outcomes of measurements — since if the computation succeeds, then 
we know in advance what all the measurement outcomes are anyway! One consequence is that, 
without loss of generality, we can postpone all measurements until the end of the computation. 

Along the way to proving Theorem [311 Knill et al. [35] showed how to simulate any postse- 
lected quantum computation using a postselected linear-optics computation!^ To formalize the 
"Postselected KLM Theorem," we now define the complexity class PostBosonP, which consists of 
all problems solvable in polynomial time using linear optics with postselection. 

Definition 32 (PostBosonP) PostBosonP is the class of languages L C {0,1}* for which there 
exist deterministic polynomial-time algorithms V,A,B such that for all inputs x G {0, 1}^; 

(i) The output ofVisanmxn matrix V (x) G Um,n (for some m,n = poly (N) ), corresponding 
to a linear- optical network that samples from the probability distribution Dyf^^y 

(a) Pry^x,^(^j [^(y) accepts] > 0. 

^^Terhal and DiVincenzo [59| later elaborated on their result, using the term "nonadaptive quantum computation" 
(or QCnad) for what we call postselection. 
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(in) If X £ L then Prj/~X)^,(^) {u) accepts \ A{y) accepts] > |. 

(iv) If X ^ L then Piyr^D^^^^ iu) accepts \ A{y) accepts] < i. 

In our terminology, Knill et al. [3^ showed that PostBosonP captures the full power of postse- 
lected quantum computation — in other words, of the class PostBQP defined in Section [2j We now 
sketch a proof for completeness. 

Theorem 33 (Postselected KLM Theorem [35j) PostBosonP = PostBQP. 

Proof Sketch. For PostBosonP C PostBQP, use the procedure from Theorem l25l to create an ordi- 
nary quantum circuit C that simulates a given linear-optical network U. Note that the algorithms 
A and B from Definition [32] can simply be "folded" into C, so that A (y) accepting corresponds 
to the first qubit of C's output being measured to be and B (y) accepting corresponds to the 
second qubit of C's output being measured to be 

The more interesting direction is PostBQP C PostBosonP. To simulate BQP in PostBosonP, the 
basic idea of KLM is to use "nondeterministic gates," which consist of sequences of beamsplitters 
and phaseshifters followed by postselected photon-number measurements. // the measurements 
return a particular outcome, then the effect of the beamsplitters and phaseshifters is to implement 
(perfectly) a 2-qubit gate that is known to be universal for standard quantum computation. We 
refer the reader to [35j for the details of how such gates are constructed; for now, assume we have 
them. Then for any BQP machine M, it is easy to create a PostBosonP machine M' that simulates 
M. But once we have BQP, we also get PostBQP essentially "free of charge." This is because the 
simulating machine M' can postselect, not only on its nondeterministic gates working correctly, 
but also (say) on M reaching a final configuration whose first qubit is |1). ■ 

We can now complete our alternative proof of Theorem [H that P*^ C BPP^*^ for any exact 
BosonSampling oracle O. 

Proof of Theorem [H Let O be an exact BosonSampling oracle. Then we claim that 
PostBosonP C PostBPP*^. To see this, let V, A, B be the polynomial-time Turing machines from 
Definition [32l Then we can create a PostBPP'^ machine that, given an input x and random string 
r: 

(i) "Succeeds" if A {O {V (x) , r)) accepts, and "fails" otherwise. 

(ii) Conditioned on succeeding, accepts if B {O iV {x) , r)) accepts and rejects otherwise. 
Then 

PP = PostBQP = PostBosonP C PostBPP^ C BPP^^", 

where the first equality comes from Theorem [9] and the second from Theorem [33l Therefore 
p#P ^ pPP is contained in BPP^'^° as well. ■ 

4.3 Strengthening the Result 

In this section, we make two simple but interesting improvements to Theorem [TJ 

The first improvement is this: instead of considering a whole collection of distributions, we 
can give a fixed distribution T>n (depending only on the input size n) that can be sampled by a 
boson computer, but that cannot be efficiently sampled classically unless the polynomial hierarchy 
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collapses. This P„ will effectively be a "complete distribution" for the noninteracting-boson model 
under nondeterministic reductions. Let us discuss how to construct such a P„, using the approach 
of Section Mi 

Let p (n) be some fixed polynomial (say n^), and let C be the set of all quantum circuits on n 
qubits with at most p (n) gates (over some finite universal basis, such as {Hadamard, Toffoli} 
|51j). Then consider the following PostBQP algorithm A, which takes as input a description of a 
circuit C* € C. First, generate a uniform superposition 



over descriptions of all circuits C S C. Then measure |C) in the standard basis, and postselect on 
the outcome being |C*). Finally, assuming |C*) was obtained, take some fixed universal circuit U 
with the property that 

Pr \U (|C)) accepts] ^ Pr [C (0") accepts] 

for all C G C, and run U on input \C*). Now, since PostBQP = PostBosonP by Theorem [33} 
it is clear that A can be "compiled" into a postselected linear-optical network A! . Let P_4' be 
the probability distribution sampled by Al if we ignore the postselection steps. Then Dj^ is our 
desired universal distribution T)^. 

More concretely, we claim that, if P„ can be sampled in FBPP, then P#P = PH = BPP^^. To 
see this, let O (r) be a polynomial-time classical algorithm that outputs a sample from P„, given 
as input a random string r G {0, l}P°^y('^). Then, as in the proof of Theorem [1] in Section [4.21 we 
have PostBosonP C PostBPP. For let V^A^B be the polynomial-time algorithms from Definition 
[32] Then we can create a PostBPP machine that, given an input x and random string r: 

(1) Postselects on O (r) containing an encoding of the linear-optical network V {£). 

(2) Assuming |y(a;)) is observed, simulates the PostBosonP algorithm: that is, "succeeds" if 
A {O (r)) accepts and fails otherwise, and "accepts" if B {O (r)) accepts and rejects otherwise. 

Our second improvement to Theorem [1] weakens the physical resource requirements needed 
to sample from a hard distribution. Recall that we assumed our boson computer began in the 
"standard initial state" ]!„) := Jl, . . . , 1, 0, . . . , 0), in which the first n modes were occupied by a 
single boson each. Unfortunately, in the optical setting, it is notoriously difficult to produce a 
single photon on demand (see Section [6] for more about this). Using a standard laser, it is much 
easier to produce so-called coherent states, which have the form 



la) :=e-H —in) 



oo „ 



n=0 

for some complex number a. (Here |n) represents a state of n photons.) However, we now observe 
that the KLM-based proof of Theorem [T] goes through almost without change, if the inputs are 
coherent states rather than individual photons. The reason is that, in the PostBosonP model, we 
can first prepare a coherent state (say \a = 1)), then measure it and postselect on getting a single 
photon. In this way, we can use postselection to generate the standard initial state |ln), then run 
the rest of the computation as before. 
Summarizing the improvements: 
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Theorem 34 There exists a family of distributions {'Dn}n>i7 depending only on n, such that: 

(i) For all n, a boson computer with coherent- state inputs can sample from T>n in poly (ra) time. 

(a) Let O be any oracle that takes as input a random string r (which O uses as its only source 
of randomness) together with n, and that outputs a sample On{r) from P„- Then P*'^ C 

gppNpO 

5 Main Result 

We now move on to prove our main result: that even approximate classical simulation of boson 
computations would have surprising complexity consequences. 

5.1 Truncations of Haar-Random Unitaries 

In this section we prove a statement we will need from random matrix theory, which seems new and 
might be of independent interest. Namely: any m^^^ x m^^^ suhmatrix of an mxm Haar-random 
unitary matrix is close, in variation distance, to a matrix of i.i.d. Gaussians. It is easy to see 
that any individual entry of a Haar unitary matrix is approximately Gaussian. Thus, our result 
just says that any small enough set of entries is approximately independent — and that here, "small 
enough" can mean not only a constant number of entries, but even m^^^-* of them. This is not 
surprising: it simply means that one needs to examine a significant fraction of the entries before 
one "notices" the unitarity constraint. 

Given m > n, recall that Um,n is the set of m x n complex matrices whose columns are or- 
thonormal vectors, and 7im,n is the Haar measure over l/(m,n- Define Sm,n to be the distribution 
over nxn matrices obtained by first drawing a unitary U from T-lm,m, and then outputting \/rnUn^n 
where C/n,n is the top-left nxn submatrix of U. In other words, Sm,n is the distribution over nxn 
truncations of m x m Haar unitary matrices, where the entries have been scaled up by a factor of 
^/m so that they have mean and variance 1. Also, recall that is the probability distribution 
over nxn complex matrices whose entries are independent Gaussians with mean and variance 
1. Then our main result states that Sm,n is close in variation distance to g^^^: 

Theorem 35 Let m > ^ log^ f , for any d > 0. Then \\Sm,n - ^""""H =0{5). 

The bound m > ^ log j is almost certainly not tight; we suspect that it can be improved (for 
example) to m = O (n^/5) . For our purposes, however, what is important is simply that m is 
polynomial in n and 1/5. 

Let Pg,Ps '■ C"^" —7- be the probability density functions of and Sm,n respectively (for 
convenience, we drop the subscripts m and n). Then for our application, we will actually need the 
following stronger version of Theorem [35l 

Theorem 36 (Haar-Unitary Hiding Theorem) Let m > ^ log^ j. Then 

PS (X) < {l + 0{6))pG {X) 

for allX £ C"^". 
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Fortunately, Theorem [36] will follow fairly easily from our proof of Theorem [35l 
Surprisingly, Theorems [35] and [36] do not seem to have appeared in the random matrix theory 
literature, although truncations of Haar unitary matrices have been studied in detail. In particular, 
Petz and Reffy [44] showed that the truncated Haar-unitary distribution Sm,n converges to the 
Gaussian distribution, when n is fixed and m — )• oo. (Mastrodonato and Tumulka [H] later gave 
an elementary proof of this fact.) In a followup paper, Petz and Reffy [45] proved a large deviation 
bound for the empirical eigenvalue density of matrices drawn from Sm,n (see also Reffy's PhD thesis 
[47]). We will use some observations from those papers, especially an explicit formula in [47] for 
the probability density function of Sm,n- 

We now give an overview of the proof of Theorem [35l Our goal is to prove that 

A{pG,Ps):= [ \pG{X)-ps{X)\dX 

is small, where the integral (like all others in this section) is with respect to the Lebesgue measure 
over the entries of X. 

The first crucial observation is that the probability distributions and Sm,n are both invari- 
ant under left-multiplication or right-multiplication by a unitary matrix. It follows that pG {X) and 
ps {X) both depend only on the list of singular values of X. For we can always write X = (xij) 
as UDV, where U,V are unitary and D = (dij) is a diagonal matrix of singular values; then 
Pg (X) = pg (D) and ps (X) = ps (D). Let Aj := df^ be the square of the i^^ singular value of X. 
Then from the identity 

kijl^ = ^ (3) 

i,j£[n] ie[n] 

we get the following formula for pG- 



e 

vr tt''" ' ' 



-A, 



Also, Reffy |47. p. 61] has shown that, provided m > 2n, we have 

PS {X) = Cm,n n ( 1 - - ^^'<™ 



m 

ie\n\ - 



for some constant Cm,n, where Ix-<m equals 1 if Aj < m and otherwise. Here and throughout, 
the Aj's should be understood as functions Aj {X) of X. 

Let Amax := maxj Aj be the greatest squared spectral value of X. Then we can divide the space 
(^nxn q£ j-Qa,trices into two parts: the head Rhead, consisting of matrices X such that Amax < k, and 
the tail i?taii) consisting of matrices X such that Amax > A:, for a value k < ^ that we will set 
later. At a high level, our strategy for upper-bounding A {pg,Ps) will be to show that the head 
distributions are close and the tail distributions are small. More formally, define 



5hcad 



[ PG (X) dX, 

cad 

Shead := / Ps{X) dX, 

Ahead := / \pG{X)-ps{X)\dX, 
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and define ^taii, Staii, and Ataii similarly with integrals over /?taii- Note that ^head + fftaii = 
•Shead + Staii = 1 by normalization. Also, by the triangle inequality, 

A iPG,Ps) = Ahead + Atail < Ahead + S'tail + Stail- 

So to upper-bound A {pg-,Ps)i it suffices to upper-bound (^taib Staii, and Ahead separately, which we 
now proceed to do in that order. 

Lemma 37 (7taii < n^e"'^/"^. 

Proof. We have 

S-tail = Pr [Amax > k] 



< Pr 



< V Pr 



I 1 2 ^ 



where the second line uses the identity ([3]) and the third line uses the union bound. ■ 
Lemma 38 Staii < n^e-^/^^'^''\ 

Proof. Recall that 'Hm,m is the Haar measure over m x m unitary matrices. Then for a single 
entry (say uu) of a matrix U = (uij) drawn from 'Hm,m, 



Pr 



|uii| > r 



(1-r) 



m— 1 



for all r G [0, 1] , which can be calculated from the density function given by Reffy ^\ for the case 
n = 1. So as in Lemma \37\ 



■Stail = Pr [Amax > k] 



< Pr 



i,je[n] \-^i3 



< 



n Pr 



I 1 



I 1 2 ^ 

Fill > n 

m—l 



< n2g-fc(l-l/m)/n2 
<^2g-fc/(2n2)_ 
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The rest of the proof is devoted to upper-bounding Ahead) the distance between the two head 
distributions. Recall that Reffy's formula for the density function ps {X) (equation (jlj)) involved 
a multiplicative constant Cm,n- Since it is difficult to compute the value of Cm,n explicitly, we will 
instead define 

and consider the scaled density function 



Ps{X) ■.= C-Ps{X) = ^\{ 



1 \ m—2n 
A,: 



Ai<m' 

vr"" -^-^ V m ■ 



We will first show that pc and ps are close on iihcad- We will then deduce from that result, 
together with the fact that gtaii and staii are small, that pc and ps must be close on i?head) which is 
what we wanted to show. Strangely, nowhere in this argument do we ever bound C directly. After 
proving Theorem 1351 however, we will then need to go back and show that (" is close to 1, on the 
way to proving Theorem 1361 
Let 



Ahead := / \pGiX)-psiX)\dX. (5) 

cad 

Then our first claim is the following. 
Lemma 39 Ahead < 

Proof. As a first observation, when we restrict to i?head) we have Xi < k < < m for all i G [n\ 
by assumption. So we can simplify the expression for ps {X) by removing the indicator variable 



Now let us rewrite equation ([5]) in the form 



ie[n] 



\ \ m—2n 
Aj 



Ahead = / Vg{X) 



^ Ps{X) 



PG {X) 



dX. 



Then plugging in the expressions for ps {X) and pc [X) respectively gives the ratio 

\m—2n 




where 



/-I \ I \m—2n 

/ = In — 



e 

Ai - (m - 2n) (- In (1 - Xi/m)) . 
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Since < < m, we may use the Taylor expansion 



/- ^ / s A7 1 1 A^ 
m 2 m'^ 3 rw^ 



So we can upper-bound / (Aj) by 



/ (Ai) <\i-(m- 2n) 
2nXi 



m 



< 



m 
2nk 

m ' 



and can lower-hound f (Aj) by 



/(A,)>A,-(m-2n)(^ + i4 + ^4 + 

A?; A? Af 



(m — 2n) Aj 



m rrfi 



= \i 

> Xi 

> - 

> - 



m (1 — Xi/m) 
Xj 

1 — Xi/m 
A? 



m — Xi 
m 



Here the last line used the fact that Xi < k < ^ < since X G -Rhead- It follows that 



m 



ie n 



m 



So 



1 - exp ^ / (Ai 



< max ^ 1 — exp 

< max 
< 



m 



,exp 



2n2A: 



m 



j 2nk'^ An^k 
4n/c (n + fc) 



where the last line used the fact that e° — 1 <25 for all 5 < 1. 
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To conclude, 

ink (n + k) 



Ahead < / Pg{X) 



< 



cad 

4n/e (n + k) 



m 



dX 



m 



Combining Lemmas [371 EH ESI and SOl and making repeated use of the triangle inequality, we 
find that 



Ahead = / \pG{X)-ps{X)\dX 

< Ahead + / \ps{X)-ps{X)\dX 



Therefore 



^£-Rhcad 

Ahead ~l~ IC^head '^headl 

< Ahead + ICShead — S'headI + IS'head " 1| + |1 " SheadI 

< 2Ahead + 5tail + Stail 

< + ^) + ^2g-fc/n2 ^ ^2g-fc/(2n2)_ 

m 

^(PCPs) < Ahead + S'tail + ^tail 

m 

Recalling that m > ^ log^ j, let us now make the choice k := 6n^log^. Then the constraint 
k < is satisfied, and furthermore A {pciPs) = O (6). This completes the proof of Theorem 1351 
The above derivation "implicitly" showed that C is close to 1. As a first step toward proving 
Theorem [36l let us now make the bound on explicit. 

Lemma 40 |C - 1\ = O (6) . 

Proof. We have 

IC^head " Sheadl < IC'Shead " 5head| + Ifl'head " 1| + |1 " Sheadl 
= Ahead + 9tail + ^tail 



^ 4nk{n + k) ^ ^2„-k/n^ , „2 

and 



m 



,5 



-I ^ 1 2 -fc/(2n2) 

Shead = 1 - Stail > 1 - n e ' ^ 

As before, recall that m > ^ log^ j and set k := 6n^ log ^. Then 

IC^head ^headl 



IC-il 



< 



•Shead 

Ank (n + k) /m + n^e"*^/"^ + n^e"*^/*^^"^) 

1 _ j^2g-fc/(2n2) 

0(5). 
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We can now prove Theorem ESI that ps {X) < (1 + O {6))pg (X) for all X G C"^". 
Proof of Theorem 1361 Our goal is to upper-bound 

^ PS jX) 
C := max t-^tt- 

xeC"X" pg [X) 

Using the notation of Lemma \39\ we can rewrite C as 

- max -— = - max exp > / (Aj 

where 



lie n 



/ (Ai) := Ai + (m - 2n) In (1 - A,/m) . 

By elementary calculus, the function / (A) achieves its maximum at A = 2n; note that this is a 
valid maximum since m > 2n. Setting Aj = 2n for all i then yields 

1 / f 2n 
C = — exp ( 2n^ + n (m — 2n) In ( 1 

C V V 

< Ig2n2g-2n2(m-2n)/m 



= 1 + O (5) . 

Here the second-to-last line used Lemma HOj together with the fact that m ^ 



5.2 Hardness of Approximate BosonSampling 

Having proved Theorem [36l we are finally ready to prove the main result of the paper: that 
|GPE|^ e FBPP'^^ , where O is any approximate BosonSampling oracle. In other words, if 
there is a fast classical algorithm for approximate BosonSampling, then there is also a BPP'^^ 
algorithm to estimate |Per (^)|^, with high probability for a Gaussian random matrix X ~ 

We first need a technical lemma, which formalizes the well-known concept of rejection sampling. 

Lemma 41 (Rejection Sampling) Let D = {px} o-nd E = {qx} be any two distributions over a 
finite set S. Suppose that there exists a polynomial-time algorithm to compute Qqx/Px given x £ S, 
where C is some constant independent of x such that |C ~ 1| ^ ^- Suppose also that Qx/Px < 1 + 5 
for all x G S. Then there exists a BPP algorithm TZ that takes a sample x ^ T) as input, and either 
accepts or rejects. TZ has the following properties: 

(i) Conditioned on TZ accepting, x is distributed according to £. 
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(ii) The probability that TZ rejects (over both its internal randomness and x ^ D) is O (5). 



Proof. TZ works as follows: first compute CQx/Px', then accept with probability ^^^'^^^ ^ 1- Prop- 
erty (i) is immediate. For property (ii), 

Pr [TZ rejects] = ( 1 - j^^^^] 

V (1 + 5) / 

Px 



x£S 

c 



xes 



1 



{1 + Sf 



0{6). 



By combining Lemma HT] with Theorem 1361 we now show how it is possible to "hide" a matrix 
X ~ C/"^"' of i.i.d. Gaussians as a random n x n submatrix of a Haar-random m x n column- 
orthonormal matrix A, provided m = Q (n^log^n). Our hiding procedure does not involve any 
distortion of X. We believe that the hiding procedure could be implemented in BPP; however, 
we will show only that it can be implemented in BPP'^^, since that is easier and suffices for our 
application. 

Lemma 42 (Hiding Lemma) Let m > ^ log^ j for some 6 > 0. Then there exists a BPP^^ 
algorithm A that takes as input a matrix X ~ ^"^"^ that "succeeds" with probability 1 — O {6) over 
X, and that, conditioned on succeeding, samples a matrix A G hlm,n from a probability distribution 
T)x, such that the following properties hold: 

(i) Xj^fm occurs as a uniformly-random nxn submatrix of A ^ T^x, for every X such that 
Pr [^(X) succeeds] > 0. 

(ii) The distribution over A G ([;;™-x" induced by drawing X ~ gnxn^ running A{X), and condi- 
tioning onA{X) succeeding is simply T-Lm,n (the Haar measure overmxn column- orthonormal 
matrices). 

Proof. Given a sample X ~ gnxn^ ^j^g first step is to "convert" X into a sample from the 
truncated Haar measure Syyi^yi. To do so, we use the rejection sampling procedure from Lemma 
HTl By Theorem ESI we have ps [X) jpQ {X) < 1 + O (5) for all X G C"^*^, where ps and pc are 

2 

the probability density functions of 5m,,n and ^"X" respectively. Also, letting := (l/vr)" /c.m,n 
be the constant from Section 15.11 we have 



C • VS (X) _ PS (X) _ n^eln] (1 - ^^/mr 

Pg{X) pg{X) W^e[n]^-^' 

which is clearly computable in polynomial time (to any desired precision) given X. Finally, we 
saw from Lemma HOl that |C — 1| = O {5). 

So by Lemma HH the rejection sampling procedure TZ has the following properties: 
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(1) TZ can be implemented in BPP. 

(2) TZ rejects with probability O (6). 

(3) Conditioned on TZ accepting, we have X 

Now suppose TZ accepts, and let X' := X/ y/m. Then our problem reduces to embedding X' as 
a random submatrix of a sample A from T-Lm,n- We do this as follows. Given a matrix A E Um,m 
let Ex (A) be the event that X' occurs as an n x n submatrix of A. Then let T>x be the distribution 
over A E ly(m,n obtained by first sampling A from 7im,n, and then conditioning on Ex (A) holding. 
Note that Vx is well-defined, since for every X in the support of Sm,n, there is some A E ZYm,n 
satisfying Ex (A). 

We now check that 2?x satisfies properties (i) and (ii). For (i), every element in the support 
of Dx contains X' as a submatrix by definition, and by symmetry, this X' occurs at a uniformly- 
random location. For (ii), notice that we could equally well have sampled A ~ Dx by first 
sampling X ^ SfYi then placing X^ at a uniformly-random location withm A^ and finally "filling 
in" the remaining (m — n) x n block of A by drawing it from 7im,n conditioned on X' . From this 
perspective, however, it is clear that A is Haar-random, since Sf^i ^ was just a truncation of l~Lm^n 
to begin with. 

The last thing we need to show is that, given X as input, we can sample from Dx in BPP'^'^. As 
a first step, we can certainly sample from "H^jn in BPP. To do so, for example, we can first generate 
a matrix A ~ ^'"X" of independent Gaussians, and then apply the Gram-Schmidt orthogonalization 
procedure to A. Now, given a BPP algorithm that samples A ~ T-Lm,n, the remaining task is to 
condition on the event Ex (A). Given X and A, it is easy to check whether Ex (A) holds. But this 
means that we can sample from the conditional distribution Dx in the complexity class Post BP P. 

Composing a BPP algorithm with a PostBPP one yields an algorithm that runs in BP-PostBPP C 
BPP^P. ■ 

The final step is to prove that, if we had an oracle O for approximate BosonSampling, then 
by using O in conjunction with the hiding procedure from Lemma H2l we could estimate |Per 
in BPP^*^, where X ~ ^"x»^ ig a Gaussian input matrix. 

To prove this theorem, we need to recall some definitions from previous sections. The set of 
tuples S = (si, . . . , Sm) satisfying si, . . . ,Sm > and si -|- • • • -|- = "'^ is denoted ^m,n- Given a 
matrix A E h(m,n, we denote by Da the distribution over $m,,n where each S occurs with probability 

Va Si!---Sm! 

Also, recall that in the |GPE|^ problem, we are given an input of the form 0"*^/^, 0^/''), where 
X is an n X n matrix drawn from the Gaussian distribution Q^^^ , The goal is to approximate 
I Per (X)l^ to within an additive error e • n!, with probability at least 1 — 5 over X. 

We now prove Theorem [3l our main result. Let us restate the theorem for convenience: 

Let O he any approximate BosonSampling oracle. Then |GPE|^ E FBPP'^^°. 

Proof of Theorem [3l Let X ~ ^"X" i^g an input matrix, and let e, (5 > be error parameters. 
Then we need to show how to approximate |Per(X)|^ to within an additive error e ■ n\, with 
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probability at least 1 — 6 over X, in the complexity class FBPP'^^''. The running time should be 
polynomial in n, 1/e, and 1/5. 

Let m :— -yn^ log^ where is a suitably large constant. Also, let X' :— Xj ^fm be a scaled 
version of X. Then we can state our problem equivalently as follows: approximate 

lPe,-(X'V^-|P'='™l' 



to within an additive error e • n!/m". 

As a first step, Lemma 02] says that in BPP'^^, and with high probability over X\ we can 
generate a matrix A G that is exactly Haar-random, and that contains X' as a random nx n 

submatrix. So certainly we can generate such an A in FBPP^*^ (indeed, without using the oracle 
O). Provided we chose K sufficiently large, this procedure will succeed with probability at least 
(say) 1 - 5/4. 

Set /3 := £(5/24. Suppose we feed {A,0^^l^,r) to the approximate BosonSampling oracle O, 
where r G {0, l}P°'y(™') is a random string. Then by definition, as r is varied, O returns a sample 
from a probability distribution such that {{Va — T^'aW ^ Z^- 

Let ps ■= Pr^A i'^] ^^^^ Qs ■= P^o^ i^] ^ ^ ^m,n- Also, let W C [m] be the subset of n 

rows of A in which X' occurs as a submatrix. Then we will be particularly interested in the basis 
state S* = {si, . . . , Sm), which is defined hy Si = 1 if i £ W and Sj = otherwise. Notice that 

|Per(^5.)|^ IP (Y'W^ 



and that 



qs* = Pr [S*] = Pr 

re{o,i}P°'y('") 



In other words: ps* encodes the squared permanent that we are trying to approximate, while qs* 
can be approximated in FBPP'^'^ using Stockmeyer's approximate counting method (Theorem I26p. 
Therefore, to show that with high probability we can approximate ps* in FBPP"^^ , it suffices to 
show that ps* and qs* are close with high probability over X and A. 

Call a basis state S S $m,n collision-free if each Sj is either or 1. Let Gm,n be the set of 
collision-free 5's, and notice that S* G Gm,n- From now on, we will find it convenient to restrict 
attention to Gm,n- 

Let A5 := \ps — qs\-, so that 

Pa-V'a\\ = \ ^s- 



m.n 



Then 



E [A5] < 



S ^Gm,n I GfYi fi I 

_ 2\\Va-V'^\ 



< 



2/3 



<3/3-— , 
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where the last Une used the fact that m = uj {n?)- So by Markov's inequahty, for all A; > 1, 



Pr 

S&Grn.n 



ni 



As >3(3k- — 



m" 



1 



In particular, if we set k ■.= 4/6 and notice that 4/3A; = 12/3/5 = e/2, 



Pr 

S ^Gm.n 



Ac > 



e n\ 
2 ' m" 



< 



Of course, our goal is to upper-bound As*, not As for a randomly-chosen S £ Gm,n- However, 
a crucial observation is that, from the perspective of O — which sees only A, and not S* or X' — 
the distribution over possible values of S* is simply the uniform one. To see this, notice that 
instead of sampling X and then A (as in Lemma H2]) . we could have equally well generated the pair 
{X,A) by first sampling A from the Haar measure T-Lm,n, and then setting X := y/mAs*, for S* 
chosen uniformly from Gm,n- It follows that seeing A gives O no information whatsoever about 
the identity of S* . So even if O is trying adversarially to maximize A^*, we still have 



Pr 

X,A 



Ac. > - • 



2 m'^' 



6 

<4- 



Now suppose we use Stockmeyer's algorithm to approximate qs* in FBPP'^^ . Then by Theo- 
rem [26l for all a > 0, we can obtain an estimate qs* such that 

Pr [|gs- -qs*\> a- qs*] < 7^, 



in time polynomial in m and 1/a. Note that 



5 J ^"^'^-IG 



1 nl 
< 2 

V n I 



SO 



Pr 



qs > 2k 



m" 



< 



1 



for all /c > 1 by Markov's inequality, so 



Pr 

X,A 



qs* > 2k ■ 



n! 



m" 



< 



k 



by the same symmetry principle used previously for A^* . 

Let us now make the choice a := £(5/16 and k := 4/5. Then putting everything together and 
applying the union bound, 



Pr 



\qs* 



nl 

PS* > e - — 
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qs* > 2k 



< Pr 
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where the probabihties are over X and A as well as the internal randomness used by the approximate 
counting procedure. So, including the probability that the algorithm A from Lemma H2] fails, the 
total probability that our FBPP'^^ machine fails to output a good enough approximation to 
ps* = |Per {X')\^ is at most 

as desired. This completes the proof. ■ 
5.3 Implications 

In this section, we harvest some implications of Theorem[3]for quantum complexity theory. First, if 
a fast classical algorithm for BosonSampling exists, then it would have a surprising consequence 
for the classical complexity of the |GPE|_|_ problem. 

Corollary 43 Suppose BosonSamplingg SampP. Then |GPE|^ G FBPP'^'^. Indeed, even if 
BosONSAMPLiNce SampP"^^, then |GPE|^ e FBPP'^^. 

However, we would also like evidence that a boson computer can solve search problems that are 
intractable classically. Fortunately, by using Theorem [12] — the "Sampling/Searching Equivalence 
Theorem" — we can obtain such evidence in a completely automatic way. In particular, combining 
Corollary 1431 with Theorem 1121 yields the following conclusion. 

Corollary 44 There exists a search problem R G BosonFP such that |GPE|^ G FBPP'^^'^ for all 
computable oracles O that solve R. So in particular, if BosonFP C FBPP (that is, all search 
problems solvable by a boson computer are also solvable classically) , then |GPE|^ G FBPP'^^. 

Recall from Theorem 1251 that BosonFP C FBQP: that is, linear-optics computers can be simu- 
lated efficiently by "ordinary" quantum computers. Thus, Corollarv 1441 implies in particular that, 
if FBPP = FBQP, then |GPE|^ G FBPP^^. Or in other words: if |GPE|^ is #P-hard, then FBPP 
cannot equal FBQP, unless P*^ = BPP'^^ and the polynomial hierarchy collapses. This would 
arguably be our strongest evidence to date against the Extended Church- Turing Thesis. 

In Sections [71 [51 and[9l we initiate a program aimed at proving |GPE|^ is #P-hard. 

6 Experimental Prospects 

Our main goal in this paper was to define and study a theoretical model of quantum computing 
with noninteracting bosons. There are several ways to motivate this model other than practical 
realizability: for example, it abstracts a basic class of physical systems, it leads to interesting 
new complexity classes between BPP and BQP, and it helped us provide evidence that quantum 
mechanics in general is hard to simulate classically. (In other words, even if we only cared about 
"standard" quantum computing, we would not know how to prove results like Theorem [3] without 
using linear optics as a proof tool.) 

Clearly, though, a major motivation for our results is that they raise the possibility of actually 
building a scalable linear-optics computer, and using it to solve the BosonSampling problem. By 
doing this, one could hope to give evidence that nontrivial quantum computation is possible, without 
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Figure 3: The Hong-Ou-Mandel dip. 



having to solve all the technological problems of building a universal quantum computer. In other 
words, one could see our results as suggesting a new path to testing the Extended Church- Turing 
Thesis, which might be more experimentally accessible than alternative paths. 

A serious discussion of implementation issues is outside the scope of this paper. Here, though, 
we offer some preliminary observations that emerged from our discussions with quantum optics 
experts. These observations concern both the challenges of performing a BosonSampling exper- 
iment, and the implications of such an experiment for complexity theory. 

6.1 The Generalized Hong-Ou-Mandel Dip 

From a physics standpoint, the experiment that we are asking for is essentially a generalization of 
the Hong-Ou-Mandel dip [29] to three or more photons. The Hong-Ou-Mandel dip (see Figure 
[3|) is a well-known effect in quantum optics whereby two identical photons, which were initially in 
different modes, become correlated after passing through a beamsplitter that applies the Hadamard 
transformation. More formally, the basis state |1,1) evolves to 



so that a subsequent measurement reveals either both photons in the first mode or else both photons 
in the second mode. This behavior is exactly what one would predict from the model in Section 
[3l in which ra-photon transition amplitudes are given by the permanents of n x n matrices. More 
concretely, the amplitude of the basis state |1, 1) "dips" to because 



and hence there is destructive interference between the two paths mapping |1, 1) to itself. 

Our challenge to experimentalists is to confirm directly that the quantum-mechanical formula 
for n-boson transition amplitudes in terms of n x n permanents given in Section \3. 31 namely 
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continues to hold for large values of n. In other words, demonstrate a Hong-Ou-Mandel interfer- 
ence pattern involving as many identical bosons as possible (though even 3 or 4 bosons would be 
of interest here). 

The point of such an experiment would be to produce evidence that a linear-optical network can 
indeed solve the BosonSampling problem in a scalable way — and that therefore, no polynomial- 
time classical algorithm can sample the observed distribution over photon numbers (modulo our 
conjectures about the computational complexity of the permanent). 

Admittedly, since complexity theory deals only with asymptotic statements, no finite experiment 
can answer the relevant questions definitively. That is, even if formula ([6]) were confirmed in the 
case of 30 identical bosons, a true-believer in the Extended Church- Turing Thesis could always 
maintain that the formula would break down for 31 bosons, and so on. Thus, the goal here is 
simply to collect enough evidence, for large enough n, that the ECT becomes less tenable as a 
scientific hypothesis. 

Of course, one should not choose n so large that a classical computer cannot even efficiently 
verify that the formula ([6]) holds! It is important to understand this difference between the 
BosonSampling problem on the one hand, and NP problems such as Factoring on the other. 
Unlike with FACTORING, there does not seem to be any witness for BosonSampling that a classical 
computer can efficiently verify, much less a witness that a boson computer can producej^ This 
means that, when n is very large (say, more than 100), even if a linear-optics device is correctly 
solving BosonSampling, there might be no feasible way to prove this without presupposing the 
truth of the physical laws being tested! Thus, for experimental purposes, the most useful values 
of n are presumably those for which a classical computer has some difficulty computing an n x n 
permanent, but can nevertheless do so in order to confirm the results. We estimate this range as 
10 < n < 50. 

But how exactly should one verify formula ([6])? One approach would be to perform full quantum 
state tomography on the output state of a linear-optical network, or at least to characterize the 
distribution over photon numbers. However, this approach would require a number of experimental 
runs that grows exponentially with n, and is probably not needed. 

Instead, given a system with n identical photons and m > n modes, one could do something 
like the following: 

(1) Prepare the "standard initial state" |ln)) in which modes 1, . . . ,n are occupied with a single 
photon each and modes n + 1, . . . , m are unoccupied. 

(2) By passing the photons through a suitable network of beamsplitters and phaseshifters, apply 
an m X m mode- mixing unitary transformation U. This maps the state \ln) to ^p{U) \ ln), 
where (p {U) is the induced action of U on n-photon states. 

(3) For each mode i G [m], measure the number of photons Si in the i*^ mode. This collapses 
the state ip{U)\ln) to some \S) = \si, . . . , Sm) , where si,...,Sm are nonnegative integers 
summing to n. 

(4) Using a classical computer, calculate |Per (?7i„,5)P / si\ ■ ■ ■ Smh the theoretical probability of 
observing the basis state | S) . 

^''indeed, given a matrix X G j;;"X"^ there cannot in general be an NP witness proving the value of Per(X), 
unless P*'' = P'^'^ and the polynomial hierarchy collapses. On the other hand, this argument does not rule out 
an interactive protocol with a BPP verifier and a BosonSampling prover. Whether any such protocol exists for 
verifying statements not in BPP is an extremely interesting open problem. 
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(5) Repeat steps (1) to (4), for a number of repetitions that scales polynomially with n and m. 

(6) Plot the empirical frequency of |Per (C/i„,5)|^ /si\ ■ ■ ■ s^! > x for all x £ [0, 1], with particular 
focus on the range x w Check for agreement with the frequencies predicted 
by quantum mechanics (which can again be calculated using a classical computer, either 
deterministically or via Monte Carlo simulation). 



The procedure above does not prove that the final state is (p{U)\ln)- However, it at least 
checks that the basis states \S) with large values of |Per (C/i^^5)|^ are more likely to be observed 
than those with small values of |Per (f7i„,5)|^, in the manner predicted by formula ([6]). 



6.2 Physical Resource Requirements 

We now make some miscellaneous remarks about the physical resource requirements for our exper- 
iment. 

Platform. The obvious platform for our proposed experiment is linear optics. However, one 
could also do the experiment (for example) in a solid-state system, using bosonic excitations. What 
is essential is just that the excitations behave as indistinguishable bosons when they are far apart. 
In other words, the amplitude for n excitations to transition from one basis state to another must 
be given by the permanent of an n x n matrix of transition amplitudes for the individual excitations. 
On the other hand, the more general formula ([6]) need not hold; that is, it is acceptable for the 
bosonic approximation to break down for processes that involve multiple excitations in the same 
mode. (The reason is that the events that most interest us do not involve collisions anyway.) 

Initial state. In our experiment, the initial state would ideally consist of at most one photon 
per mode: that is, single-photon Fock states. This is already a nontrivial requirement, since a 
standard laser outputs not Fock states but coherent states, which have the form 
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for some a £ C (In other words, sometimes there are zero photons, sometimes one, sometimes 
two, etc., with the number of photons following a Poisson distribution.) Fortunately, the task of 
building reliable single-photon sources is an extremely well-known one in quantum optics [40J , and 
the technology to generate single-photon Fock states has been steadily improving over the past 
decade. 

Still, one can ask whether any analogue of our computational hardness results goes through, 
if the inputs are coherent states rather than Fock states. Bartlett and Sanders [H| have shown 
that, if the inputs to a linear-optical network are coherent states, and the measurements are so- 
called homodyne (or more generally Gaussian) measurements, then the probability distribution 
over measurement outcomes can be sampled in classical polynomial time. Intuitively, in this case 
the photons behave like classical waves, so there is no possibility of a superpolynomial quantum 
speedup. 

On the other hand, if we have coherent-state inputs and measurements in the photon-number 
basis, then the situation is more complicated. As pointed out in Section [4.31 in this case Theorem[T] 
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still holds: using postselection, one can prove that exact classical simulation of the linear-optics ex- 
periment would collapse the polynomial hierarchy. However, we do not know whether approximate 
classical simulation would already have surprising complexity consequences in this case. 

Measurements. For our experiment, it is desirable to have an array of m photodetectors, 
which reliably measure the number of photons Si in each mode i £ [m]. However, it would also 
suffice to use detectors that only measure whether each Sj is zero or nonzero. This is because 
our hardness results talk only about basis states \S) = |,si, . . . , Sm) that are collision-free, meaning 
that Si £ {0,1} for all i G [m]. Thus, one could simply postselect on the runs in which exactly 
n of the m detectors record a photon, in which case one knows that Si = 1 for the corresponding 
modes i, while Sj = for the remaining m — n modes. (In Appendix 1131 we will prove a "Boson 
Birthday Bound," which shows that as long as m is sufficiently large and the mode-mixing unitary 
U is Haar-random, this postselection step succeeds with probability close to 1. Intuitively, if m is 
large enough, then collision-free basis states are the overwhelming majority.) 

What might not suffice are so-called Gaussian measurements. As mentioned earlier, if the 
measurements are Gaussian and the inputs are coherent states, then Bartlett and Sanders [8] 
showed that no super polynomial quantum speedup is possible. We do not know what the situation 
is if the measurements are Gaussian and the inputs are single-photon Fock states. 

Like single-photon sources, photodetectors have improved dramatically over the past decade, 
but of course no detector will be 100% efficiently As we discuss later, the higher the photodetector 
efficiencies, the less need there is for postselection, and therefore, the more easily one can scale to 
larger numbers of photons. 

Number of photons n. An obvious question is how many photons are needed for our 
experiment. The short answer is simply "the more, the better!" The goal of the experiment is 
to confirm that, for every positive integer n, the transition amplitudes for n identical bosons are 
given by n X n permanents, as quantum mechanics predicts. So the larger the n, the stronger the 
evidence for this claim, and the greater the strain on any competing interpretation. 

At present, it seems fair to say that our experiment has already been done for n = 2 (this is the 
Hong-Ou-Mandel dip [29j). However, we are not aware of any experiment directly testing formula 
([6]) even for n = 3. Experimentalists we consulted expressed the view that this is mostly just a 
matter of insufficient motivation before now, and that the n = 3 and even n = 4 cases ought to be 
feasible with current technology. 

Of course, the most interesting regime for computer science is the one where n is large enough 
that a classical computer would have difficulty computing an n x n permanent. The best known 
classical algorithm for the permanent, Ryser's algorithm, uses about 2"^^n^ floating-point oper- 
ations. If n = 10, then this is about 200, 000 operations; if n = 20, it is about 800 million; if 
n = 30, it is about 2 trillion. In any of these cases, it would be exciting to perform a linear-optics 
experiment that "almost-instantly" sampled from a distribution in which the probabilities were 
given by n X n permanents. 

Number of modes m. Another important question is how many modes are needed for our 
experiment. We showed in Theorem [3] that it suffices to use m = O log^ ^) modes, which 
is polynomial in n but impractical. We strongly believe that an improved analysis could yield 
m = O {n?) ■ On the other hand, by the birthday paradox, we cannot have fewer than m = 17 {v?^ 

^^Here the "efficiency" of a piiotodetector refers to tlie probability of its detecting a photon that is present. 
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modes, if we want the state ip{U)\ln) to be dominated by collision-free photon configurations 
(meaning those containing at most one photon per mode). 

Unfortunately, a quadratic number of modes might still be difficult to arrange in practice. So 
the question arises: what would happen if we ran our experiment with a linear number of modes, 
m = O (n)? In that case, almost every basis state would contain collisions, so our formal argument 
for the classical hardness of approximate BosonSampling, based on Conjectures [6] and El would no 
longer apply. On the other hand, we suspect it would still be true that sampling is classically hard! 
Giving a formal argument for the hardness of approximate BosonSampling, with n photons and 
m = O (n) modes, is an important technical challenge that we leave. 

In the meantime, if the goal of one's experiment is just to verify that the permanent formula 
([6]) remains correct for large values of n, then large numbers of photon collisions are presumably 
acceptable. In this case, it should suffice to set m ^ n, or possibly even m <^ n (though note that 
it is easy to give a classical simulation algorithm that runs in n*^*-™^ time). 

Choice of unitary transformation U. One could look for an n-photon Hong-Ou-Mandel 
dip using any unitary transformation U that produces nontrivial interference among n of the m 
modes. However, some choices of U are more interesting than others. The prescription suggested 
by our results is to choose U randomly, according to the Haar measure over mxm unitaries. Once 
U is chosen, one can then "hardwire" a network of beamsplitters and phaseshifters that produces 
U. 

There are at least three reasons why using a Haar-random U seems like a good idea: 

(1) Theorem 1351 showed that any sufficiently small submatrix of a Haar-random unitary matrix 
U is close to a matrix of i.i.d. Gaussians. This extremely useful fact is what let us prove 
Theorem [21 which relates the hardness of approximate BosonSampling to the hardness of 
more "natural" problems that have nothing to do with unitary matrices. 

(2) Setting aside our results, the Haar measure is the unique rotationally-invariant measure over 
unitaries. This makes it an obvious choice, if the goal is to avoid any "special structure" 
that might make the BosonSampling problem easy. 

(3) In the linear-optics model, one simple way to apply a Haar-random mxm unitary matrix U 
is via a network of poly (m) randomly- chosen beamsplitters and phaseshifters. 

Optical elements. One might worry about the number of beamsplitters and phaseshifters 
needed to implement an arbitrary mxm unitary transformation U, or a Haar-random U in par- 
ticular. And indeed, the upper bound of Reck et al. [l6] (Lemma I14p shows only that O (m^) 
beamsplitters and phaseshifters suffice to implement any unitary, and this is easily seen to be tight 
by a dimension argument. Unfortunately, a network of ~ optical elements might already strain 
the limits of practicality, especially if m has been chosen to be quadratically larger than n. 

Happily, Section 16.31 will show how to reduce the number of optical elements from O (m^) 
to 0{mn), by exploiting a simple observation: namely, we only care about the optical network's 
behavior on the first n modes, since the standard initial state has no photons in the remaining 
m — n modes anyway. Section [6.31 will also show how to "parallelize" the resulting optical network, 
so that the O {mn) beamsplitters and phaseshifters are arranged into only O (n log m) layers. 
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Whether one can parallehze hnear-optics computations stih further, and whether one can sample 
from hard distributions using even fewer optical elements (say, O (mlogm)), are interesting topics 
for future work. 

Error. There are many sources of error in our experiment; understanding and controlling the 
errors is perhaps the central challenge an experimentalist will face. At the most obvious level: 

(1) Generation of single-photon Fock states will not be perfectly reliable. 

(2) The beamsplitters and phaseshifters will not induce exactly the desired unitary transforma- 
tions. 

(3) Each photon will have some probability of "getting lost along the way." 

(4) The photodetectors will not have perfect efficiency. 

(5) If the lengths of the optical fibers are not well-calibrated, or the single-photon sources are 
not synchronized, or there is vibration, etc., then the photons will generally arrive at the 
photodetectors at different times. 

If (5) occurs, then the photons effectively become distinguishable, and the amplitudes will no 
longer correspond to n x n permanents. So then how well-synchronized do the photons need to be? 
To answer this question, recall that each photon is actually a Gaussian wavepacket in the position 
basis, rather than a localized point. For formula ([6]) to hold, what is necessary is that the photons 
arrive at the photodetectors within a short enough time interval that their wavepackets have large 
pairwise overlaps. 

The fundamental worry is that, as we increase the number of photons n, the probability of a 
successful run of the experiment might decrease like c~". In practice, experimentalists usually 
deal with such behavior by postselecting on the successful runs. In our context, that could mean 
(for example) that we only count the runs in which n detectors register a photon simultaneously, 
even if such runs are exponentially unlikely. We expect that any realistic implementation of our 
experiment would involve at least some postselection. However, if the eventual goal is to scale 
to large values of n, then any need to postselect on an event with probability c~" presents an 
obvious barrier. Indeed, from an asymptotic perspective, this sort of postselection defeats the 
entire purpose of using a quantum computer rather than a classical computer. 

For this reason, while even a heavily-postselected Hong-Ou-Mandel dip with (say) n = 3, 4, 
or 5 photons would be interesting, our real hope is that it will ultimately be possible to scale our 
experiment to interestingly large values of n, while maintaining a total error that is closer to than 
to 1. However, supposing this turns out to be possible, one can still ask: how close to does the 
error need to be? 

Unfortunately, just like with the question of how many photons are needed, it is difficult to give 
a direct answer, because of the reliance of our results on asymptotics. What Theorem [3] shows is 
that, if one can scale the BosonSampling experiment to n photons and error 5 in total variation 
distance, using an amount of "experimental effort" that scales polynomially with both n and 1/6, 
then modulo our complexity conjectures, the Extended Church- Turing Thesis is false. The trouble 
is that no finite experiment can ever prove (or disprove) the claim that scaling to n photons and 
error 5 takes poly {n, 1/(5) experimental effort. One can, however, build a circumstantial case for 
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this claim — by increasing n, decreasing 5, and making it clear that, with reasonable effort, one 
could have increased n and decreased 6 still further. 

One challenge we leave is to prove a computational hardness result that works for a fixed 
(say, constant) error S, rather than treating 1/5 as an input parameter to the sampling algorithm 
along with n. A second challenge is whether any nontrivial error-correction is possible within the 
noninteracting-boson model. In standard quantum computing, the famous Threshold Theorem 
[71136] asserts that there exists a constant r > such that, even if each qubit fails with independent 
probability r at each time step, one can still "correct errors faster than they happen," and thereby 
perform an arbitrarily long quantum computation. In principle, the Threshold Theorem could be 
applied to our experiment, to deal with all the sources of error listed above. The issue is that, 
if we have the physical resources available for fault-tolerant quantum computing, then perhaps 
we ought to forget about BosonSampling, and simply run a universal quantum computation! 
What we want, ideally, is a way to reduce the error in our experiment, without giving up on the 
implementation advantages that make the experiment attractive in the first place. 



6.3 Reducing the Size and Depth of Optical Networks 

In this section, we discuss how best to realize an m x m unitary transformation U, acting on 
the initial state \ln), as a product of beamsplitters and phaseshifters. If we implement U in the 
"obvious" way — by appealing to Lemma [H] — then the number of optical elements and the depth 
will both be O (m-^) . However, we can obtain a significant improvement by noticing that our goal 

is just to apply some unitary transformation U such that ip{U) |1„) = (p (U) \ ln)- we do not care 
about the behavior on U on inputs other than This yields a network in which the number of 

optical elements and the depth are both O {ran). 

The following theorem shows that we can reduce the depth further, to O (nlogm), by exploiting 
parallelization. 

Theorem 45 (Parallelization of Linear-Optics Circuits) Given any m x m unitary opera- 
tion U, one can map the initial state to (p{U)\ln) using a linear- optical network of depth 
O (nlogm), consisting of O (mn) beamsplitters and phaseshifters. 

Proof. We will consider a linear-optics system with m -\- n modes. Let 

U 



■ / 

be a unitary transformation that acts as U on the first m modes, and as the identity on the 
remaining n modes. Then our goal will be to map to ip (V) |ln)- 

Let |ej) be the basis state that consists of a single photon in mode i, and no photons in the 
remaining m -\- n — 1 modes. Also, let {ipi) = V\ei). Then it clearly suffices to implement some 
unitary transformation V that maps |ej) to for all i G [n] — for then ^(V) |ln) = ^i^) I In) by 
linearity. 

Our first claim is that, for each i G [n] individually, there exists a unitary transformation Vi that 
maps |ej) to and that can be implemented by a linear-optical network of depth log2 m + O (1) 
with O (m) optical elements. To implement Vi, we use a binary doubling strategy: first map |ej) 
to a superposition of the first two modes, 

\zi) = ai lei) + 02 |e2) . 
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Then, by using two beamsphtters in parallel, map the above state l^i) to a superposition of the 
first four modes, 

1^2) = ai |ei) + ai |e2) + 03 jes) + 04 \e^^ . 

Next, by using four beamsplitters in parallel, map \z2) to a superposition {z^} of the first eight 
modes, and so on until is reached. It is clear that the total depth required is log2 m + O (1), 
while the number of optical elements required is O (m). This proves the claim. 

Now let Si be a unitary transformation that swaps modes i and m + i, and that acts as the 
identity on the remaining m + n — 2 modes. Then we will implement V as follows: 

V = VnSnV;i ^252^2^ • ViS^V^ • S„ • • • 5i. 

In other words: first swap modes 1, . . . , n with modes m+1,... ,m + n. Then, for all i := 1 to n, 
apply ViSiV^. 

Since each Si involves only one optical element, while each Vi and involves O (m) optical 
elements and O (log m) depth, it is clear that we can implement V using a linear-optical network 
of depth O (n log m) with O {mn) optical elements. 

To prove the theorem, we need to verify that V \ei) = for all i G [n]. We do so in three 
steps. First, notice that for all i e [n], 

ViSiV^ {Si la)) = ViSiV^ \em+i) 
= ViSi le^+i) 
= Vi \ei) 
= \A)- 

where the second line follows since acts only on the first m modes. 
Second, for all i,j G [n] with i 7^ j, 

VjSjVj ICto+j) = 1 6^72+ j) , 

since Vj and Sj both act as the identity on \em+i)- 

Third, notice that {ipi\ipj) = for all i ^ j, since \ijji) and correspond to two different 
columns of the unitary matrix U. Since unitaries preserve inner product, this means that Vj iV^i) 
is also orthogonal to VJ lipj) = VjVj \ej) = jcj): in other words, the state VJ \i/ji) has no support 

on the j'*'' mode. It follows that Sj acts as the identity on Vj \ipi) — and therefore, for all i,j G [n] 
with i ^ j, we have 

Summarizing, we find that for all i E [n]: 

• ViSiV^ maps \em+i) to \ipi). 

• VjSjVj maps le^+j) to itself for all j < i. 

• VjSjVj maps Itpi) to itself for all j > i. 

We conclude that V |ej) = ViSiV^ \em+i) = IV'j) fo^ ^ ^ N- This proves the theorem. ■ 
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7 Reducing GPEx to |GPE|^ 

The goal of this section is to prove Theorem [71 that, assuming Conjectured] (the Permanent Anti- 
Concentration Conjecture), the GPEx and |GPE|^ problems are polynomial-time equivalent. Or 
in words: if we can additively estimate |Per(X)|^ with high probability over a Gaussian matrix 
X ~ ^"X"^ then we can also multiplicatively estimate Per {X) with high probability over a Gaussian 
matrix X. 

Given as input a matrix X ~ A/" (0, 1)^^" of i.i.d. Gaussians, together with error bounds e, 5 > 0, 
recall that the GPEx problem (Problem^ asks us to estimate Per {X) to within error ibe-|Per 
with probability at least 1 — 5 over X, in poly (n, 1/e, 1/5) time. Meanwhile, the |GPE|^ problem 
(Problem [2]) asks us to estimate |Per (X)|^ to within error ibe • n!, with probability at least 1 — 8 
over X, in poly (n, 1/e, 1/5) time. It is easy to give a reduction from |GPE|^ to GPEx. The 
hard direction, and the one that requires Conjecture [H is to reduce GPEx to |GPE|^. 

While technical, this reduction is essential for establishing the connection we want between 

(1) Theorem [3] (our main result), which relates the classical hardness of BosonSampling to 

|2 
l±' 



|GPE|+, and 



(2) Conjecture [5] (the Permanent-of-Gaussians Conjecture), which asserts that the Gaussian 
Permanent Estimation problem is #P-hard, in the more "natural" setting of multiplicative 
rather than additive estimation, and Per {X) rather than |Per (X)|^. 

I 1 2 

Besides GPEx and |GPE|_|_, one could of course also define two "hybrid" problems: GPE-t 
(additive estimation of Per {X)), and |GPE|^ (multiplicative estimation of |Per (X)|^). Mercifully, 
we will not need to make explicit use of these hybrid problems. Indeed, assuming Conjecture [H 
they will simply become equivalent to GPEx and |GPE|_|_ as a byproduct. 

2 

Let us start by proving the easy direction of the equivalence between GPEx and |GPE|_|_. This 
direction does not rely on any unproved conjectures. 

2 

Lemma 46 |GPE|_|_ is polynomial-time reducible to GPEx. 

Proof. Suppose we have a polynomial-time algorithm M that, given 0"'^/'^, 0"'^/'^), outputs a 
good multiplicative approximation to Per [X] — that is, a z such that 

|z-Per(X)| <e|Per(X)| 

— with probability at least 1 — 5 over X ~ gnxn^ Then certainly is a good multiplicative 

2 



approximation to |Per(X)| : 

\zf - |Per(X)|2 = \\z\ - |Per(X)|| {\z\ + |Per(X)|) 

<e{2 + e) |Per(X)|2 
< 3e|Per (X)|^ 

We claim that is also a good additive approximation to |Per(X)|^, with high probability over 
X. For by Markov's inequality. 



Pr 

X 



|Per(X)|^ > k-n\ 



1 
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So by the union bound, 



Pr 

X 



I Per {X)Y 



> ek ■ n\ 



< Pr 

X 

<6 + 



Per(X)|2 >3e|Per(X)p 



+ Pr 

X 



3e|Per (X)|^ > ek 



n! 



A; 



Thus, we can achieve any desired additive error bounds ( 

< 6'. 



and (5 + f 



e', (5') by (for example) setting e := e'5' /6, 
Clearly this increases M's running time 



6 := 6' /2, and k := 6/5', so that ek - 
by at most a polynomial factor. ■ 

We now prove that, assuming the Permanent Anti-Concentration Conjecture, approximating 
|Per(X)|^ for a Gaussian random matrix X ~ ^"-x" hard as approximating Per (X) itself. 



This result can be seen as an average-case analogue of Theorem [2SJ To prove it, we need to give 
a reduction that estimates the phase Per (X) /|Per(X)| of a permanent Per(X), given only the 
ability to estimate |Per(X)| (for most Gaussian matrices X). As in the proof of Theorem 128^ 
our reduction proceeds by induction on n: we assume the ability to estimate Per (Y) for a certain 
(n — 1) X (n — 1) submatrix Y of X, and then use that (together with estimates of |Per {X')\ for 
various nxn matrices X') to estimate Per(X). Unfortunately, the reduction and its analysis 
are more complicated than in Theorem 128^ since in this case, we can only assume that our oracle 



estimates |Per {X)\ with high probability if X "looks like" a Gaussian matrix. This rules out the 



adaptive reduction of Theorem 1281 which even starting with a Gaussian matrix X, would vary the 
top-left entry so as to produce new matrices X' that look nothing like Gaussian matrices. Instead, 
we will use a nonadaptive reduction, which in turn necessitates a more delicate error analysis, as 
well as an appeal to Conjecture El 

To do the error analysis, we first need a technical lemma about the numerical stability of 
triangulation. By triangulation, we simply mean a procedure that determines a point x G W^, 
given the Euclidean distances A {x,yi) between x and d + 1 fixed points yi, . . . ,yd+i S I^'^ that 
are in general position. So for example, the d = 3 case corresponds to how a GPS receiver would 
calculate its position given its distances to four satellites. We will be interested in the d = 2 
case, which corresponds to calculating an unknown complex number x = Per (X) £ C given the 
squared Euclidean distances |x — , |x — ^2!^ , l^; — ysl^i for some yi, ^2, ys £ C that are in general 
position. The question that interests us is this: 

1 2 I 1 2 I 1 2 

Suppose our estimates of the squared distances \x — yi\ , |x — ^2! , F — yal cL^e noisy, 
and our estimates of the points yi,y2,y3 are also noisy. How much noise does that 
induce in our resulting estimate of x? 

The following lemma answers that question, in the special case where yi = 0, y2 = w, ys = iw 
for some complex number w. 

Lemma 47 (Stability of Triangulation) Let z = re*^ £ C be a hidden complex number that 
we are trying to estimate, and let w = ce*"^ £ C be a second "reference" number (r,c > 0, 9,t € 
(— 7r,7r]J. For some known constant A > 0, let 



R 
S 
T 
C 



iXw\ 



2 I \2 2 

r + A c 



2Arccos {9 — r) 



r'^ + X^c^ - 2\rcs\.n{e - t) , 



I |2 2 
\w\ = C . 
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Suppose we are given approximations R, S, T, C, r to R, S, T, C, r respectively, such that 



R-R , S-S , T-T < eye, 
C-C <eC. 

Suppose also that £ < ^ min {l, ]^}- Then the approximation 

'r + c-s^ 



e := T + sgQ.{R + C - arccos 



satisfies 



2VRC 



ic 



mod27r < |f - r| + 1.37^/s + ■ 



Proof. Without loss of generality, we can set A := 1; the result for general A > then follows by 
replacing w with Xw and C with X^C. 

Let a := R/C, /3 := S/C, and 7 := T/C, and note that a > 2e. Observe that 

R+C-S a+l-P 



cos {9 — t) 
sin {0 — t) = 



So we can write 



R + C-T _ q; + 1-7 
2^fRC ~ 2v^ 

' a + l- 



9 = T + b arccos 

where 6€ { — 1,1} is a sign term given by 

b := sgn (6* — r) = sgn (sin {9 — r)) = sgn (a + 1 — 7) . 

Now let 5 := 3^ := 5/C, 7 := T/C, and X := ^/C- Note that |5 - q;| , P - /3 , 17 - 7I , |x - 1| < 
£. Let 

b := sgn (5 + X - 7) > 

5 + x - /3 \ 



:= r + 6 arccos 



2Vax 



We now consider two cases. First suppose |q; + 1 — 7I < 3e. Then \2-s/a sin {9 — t)\ < 3e, 
which implies 

sin^ {e-T)<^. 



Likewise, we have 



2v aX sin 



4a 



= |a + x - 7I 

< |a + 1 - 7I + |a - a| + Ix - 1| + I7 - 7I 

< 6s 
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and hence 



So if we write 



sm 



(6£) 



(2v^)" 



< 



(a — e) (1 — e) 



9 = T + b arccos (cos {6 — t)) , 
9 = T + b arccos ^cos ~ ^) ) ' 



we find that 



9-9 



\t — t\ < |arccos (cos {9 — t))| + 



arccos cos 



9e2 

< arccos \ / 1 — — h arccos W 1 



9e2 



(a — e) (1 — s) 



= arcsin 
< 5.32 



Se 

3e 



+ arcsin 



3e 



y/{a-e){l- e) 
3s \ 



V(a-e)(l-e) 



Here the last two hnes used the fact that £ < ^ min{l, a}, together with the inequahty arcsin x < 
l.lx for small enough x. 

Next suppose |a + 1 — 7I > Se. Then by the triangle inequality, 



|a + A — 7I — |a + 1 — 7II < la — a| + I7 — 7I + Ix — 1| < Se, 
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which imphes that sgn (a + X ~ 7) = sgn (a + 1 — 7) and hence b = b. So 



r — r < 



arccos 



< arccos 



I a + x- P 
a + 1 — (3 — 3e 



arccos 



arccos 



a + 1 - /3 

a + 1- (3 
2^ 



< 



< 



3e 



+ |a + l-/3| 



'ax 



3e 



\ 2 V(a - 6) (1 - e) 



+ 2Va 



1 



2. 



'"X 



^ 3 



3e 



2 y 2y/(0.9a) (0.9) 



+ 




< 



3 5e 



+ 



a 



2\ yy {a - e) {1 - e) 



< 



3 5e 1 

+ 



a 



2y 3^/a 2 \{a-e){l-e) 



5 _^ (1 + a) 



2"^ V 3\/a 2 (0.9a) (0.9) 



'5 1 



+ ^ + 1 



6 \/ a x/a 



1 



< 1.37^/1 ( ^ + 1 ) . 



Here the second hne used the monotonicity of the arccos function, the third hue used the inequahty 

arccos {x — e) — arccos x < l.S-v/e 

for e < ^ , and the fifth and ninth hues used the fact that e < min | ^ , ^ } . Combining the two 
cases, we have 

< I? - t| + max I 5.32^, 1.37Ve ( ^ + 1 



a 



a 



Using the fact that s < min { ^ j ^ } j one can check that the second item in the maximum is always 
greater. Therefore 



1 



C 



< |?-r| + 1.37V^ ( — + 1 ) = \7 - t\ + 1.37^/l I a/;^ + 1 



as claimed. ■ 

We will also need a lemma about the autocorrelation of the Gaussian distribution, which will 
be reused in Section [9l 
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Lemma 48 (Autocorrelation of Gaussian Distribution) Consider the distributions 



Vi =AA(0,(l-e 



N 



V2 = \{M{Vi,l\ 



1=1 



for some vector v G C^. We have 



< \\v\ 



2 • 



Proof. It will be helpful to think of each complex coordinate as two real coordinates, in which 
case = M (0, 1/2)^''^ and u is a vector in 
For the first part, we have 



\Vi-Q 



N\ 



< 2N 


,( 


N 


/•oo 




J — 00 


< 2Ne 





-xV(l-e)' 



MO 



dx 



where the first line follows from the triangle inequality and the last line from straightforward 
estimates. 

For the second part, by the rotational invariance of the Gaussian distribution, the variation 
distance is unaffected if we replace v by any other vector with the same 2-norm. So let v := 



{£,0,..., 0) where 

11^2-^^11 = 
< 



Then 



a;i,...,X2jv=-oo 
00 



g (xi £) g X'l 



TT 



TT 



2^^ 



-{x-ey 



dx 



e 1 e 



vr 



g -l'2JV 



dx\ ■ ■ ■ dx2N 



where the last line follows from straightforward estimates. ■ 

Using Lemmas ST] and HHI we can now complete the proof of Theorem [7) that assuming Con- 
jecture [6] (the Permanent Anti-Concentration Conjecture), the GPEx and |GPE|^ problems are 
polynomial-time equivalent. 

Proof of Theorem [T]. Lemma H6l already gave an unconditional reduction from |GPE|^ to 

2 

GPEx. So it suffices to give a reduction from GPEx to |GPE|_|_, assuming the Permanent Anti- 
Concentration Conjecture. 

Throughout the proof, we will fix an iV x iV input matrix X = (xij) S C^^^, which we think 
of as sampled from the Gaussian distribution Q^^^ , Probabilities will always be with respect to 
, For convenience, we will often assume that "bad events" (i.e., estimates of various 



X ^ Q 
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quantities outside the desired error bounds) simply do not occur; then, at the end, we will use the 
union bound to show that the assumption was justified. 

The GPEx problem can be stated as follows. Given the input {^X, 0^/^, 0^/'') for some e,6 > 0, 
output a complex number z G C such that 

|z-Per(X)| <e|Per(X)|, 

with success probability at least 1 — 5 over X, in time poly {N, 1/5). 

Let O be an oracle that solves |GPE|_|_. That is, given an input (^,0i/%0i/^) where A IS an 
nxn complex matrix, O outputs a nonnegative real number O ((A,0^^'^,0^^^)) such that 



Pr 



O ((a,0^/%0^/^)) - |Per(^)|2 < e|Per(A)| 



> 1- A. 



Then assuming Conjecture^ we will show how to solve the GPEx instance (X, 0^/"^, 0^/*^) in time 
poly (A^, l/£,l/6), with the help of 3A^ nonadaptive queries to O. 

Let R = |Per (^)|^. Then by simply calling O on the input matrix X, we can obtain a good 



< sR/W. Therefore, our problem reduces to 
In other words, we need to give a procedure that 
< 0.9e, and does so with high probability. 



approximation R to R, such that (say) R — R 

estimating the phase = Per (X) / |Per {X)\. 

returns an approximation 6 to 6 such that (say) 

(Here and throughout, it is understood that all differences between angles are mod27r.) 

For all n G [A'"], let Xn be the bottom-right nxn submatrix of X (thus Xj^ = X). A crucial 
observation is that, since X is a sample from Q^^^ ^ each X„ can be thought of as a sample from 

Qnxn 

As in Theorem 1281 given a complex number w and a matrix A = (aij), let A^'^^ be the matrix 
that is identical to A, except that its top-left entry equals an — w instead of an. Then for any 
n and w, we can think of the matrix xli"^ as having been drawn from a distribution vl^^ that is 
identical to ^"-^"^ except that the top-left entry is distributed according to {—w, 1)q rather than 
Q. Recall that by Lemma BHl the variation distance between and Q^^^ satisfies 



< \w\ 



Let A > be a parameter to be determined later. Then for each n E [A^], we will be interested 



in two specific nxn matrices besides X„, namely X, 
reduction will be based on the identities 



[A] 



and Xn^'^ . Similarly to Theorem [28} our 



Per f X[^l 
Per ( Xi!^^ 



Per(X„) 
Per(X„) 



A Per (X„_i) 
UPer [Xn-i] 



More concretely, let 



Rn 
On 

Sn 
Tn 



|Per {Xn[ 
Per {Xn 
I Per (X„ 

Per 



Per 



xw 

n 



I Per (X„) 
I Per (X„ 



APer(X„_i)|' 
-zAPer (X„_i) 
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Then some simple algebra — identical to what appeared in Lemma HT] — yields the identity 

Rn + Rn~l — Sn 



'n — '-'n- 



-1 + sgn {Rn + Rn-i - Tn) arccos 



2^ RnRn-l 



for all n > 2. "Unravelling" this recursive identity, we obtain a useful formula loi = 9^ = | p^^|^| | : 

TV 

n XNN 



\xnn\ 



n=2 



where 



:= sgn {Rn + -Rn-i - Tn) arccos 



Rn + Rn—1 Sn 
2\/ RnRn-l 



Our procedure to approximate 9 will simply consist of evaluating the above expression for all n > 2, 
but using estimates Sn,Tn produced by the oracle O in place of the true values Rn, Sn,Tn. 

' — ' 1 2 

In more detail, let -Ri := |a;ArAr| , and for all n > 2, let 



Rr,.:=O{(Xn,0'^',0 I 



5„:=0((XW,OVSOVA 

where e, A > 1/ poly (A^) are parameters to be determined later. Then our procedure for approxi- 
mating Q is to return 



N 



n=2 



where 



■= Sgn ( Rn + i?n-i " Tn I arccos 



Rn ~\~ Rn—1 Sn 



Clearly this procedure runs in polynomial 
We now upper-bound the error 



2y RnRn-l 

time and makes at most 3A^ nonadaptive calls to O. 
incurred in the approximation. Since 

N 



n=2 



it suffices to upper-bound 

Pr 
Pr 



Pr 



^n (,r, 



for each n. By the definition of O, for all n G [N] we have 



Rn Rn 
Sn ~ Sn 



^n -^n 



< eRn 

< eSn 

< eT„ 



> 1 - A, 

> 1 - A 

> 1 - A - A, 

> 1 - A 

> 1 - A- A 



p[A] _ gnxn 



2^[jA] Qnxn 
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Also, let p (n, 1 / (3) be a polynomial such that 



Pr 



|Per(y4)r > 



n! 



>l-/3 



for all n and /3 > 0; such a p is guaranteed to exist by Conjecture [6l It will later be convenient to 
assume p is monotone. Then 



Pr 



Rn > 



>l-/3 



p(n,l//3). 

In the other direction, for all < k < 1 Markov's inequality gives us 



Pr 
Pr 
Pr 



Rn < — 

's^ < ^' 

T < 



>1-K, 

>l-K-\, 

>1-K-X, 



where we have again used the fact that Sn, Tn are random variables with variation distance at most 
A from Rn- Now think of /3,k > 1/ poly (A^) as parameters to be determined later, and suppose 
that all seven of the events listed above hold, for all n G [N]. In that case. 



Rn Rn 



< eRn 



< e— 

K 



< € 



Rn-in [n - 1)! 

K Rn-l 

Rn-in 



< € 



Rn-lN 



p(n-l,l//3) 



p(A,l//3) 



and likewise 



eN.p{N,l/(3) ,2j, 
^A^ ^ 

eN-p{N,l/^),2 



^n 



< 



kA2 



A Rn-l- 



Plugging the above bounds into Lemma H71 we find that, if there are no "bad events," then noisy 
triangulation returns an estimate ^„ of ^„ such that 



< 1.37 



< 1.37 



eN-piN,l/f3) 



^2 



aJ^ + i 

Rn 



eN-p{N,l/P) 



CA2 



/ {n-l)\/K 
n!/p(iV,l//3) 



+ 1 



< 1.37Ve 



/ p(jV,l//3) ^ VN^piNMP) 
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We now upper-bound the probability of a bad event. Taking the union bound over all n G [N] and 
all seven possible bad events, we find that the total probability that the procedure fails is at most 



PFAIL := (3A + 3/t + 4A + /3) N. 

Thus, let us now make the choices A,k := j^, A := and /3 := so that pfail < 5 as 

desired. Let us also make the choice 



2x3 



€ :- 



7120N^p {N, 4N/6) 



2 • 



Then 



N 



< 2Z kn - e„ 



n=2 



< 



^ / 12N.p{N,m/6) ^ 32V3N^^piN,m/6) \ ^ 

\ 6 6^/'^ J 



9s 
< — 
- 10 

as desired. Furthermore, if none of the bad events happen, then we get "for free" that 



R-R 



Rn — Rn 



sR 

< .R. < 



So letting r := ^/R and r := yR, by the triangle inequality we have 



re'' -re'' 



< |r — r| + — 2 cos 
R-R 



< 



r + r 



eR 9e 

< h r — 

- lOr 10 

= er 

= e|Per(X)|, 



and hence we have successfully approximated Per {X) = re" 



8 The Distribution of Gaussian Permanents 

In this section, we seek an understanding of the distribution over Per(X), where X ~ gnxn jg 
a matrix of i.i.d. Gaussians. Here, recall that Q = (0,1)5- is the standard complex normal 
distribution, though one suspects that most issues would be similar with M (0, 1)]^, or possibly 
even the uniform distribution over { — 1, 1}. As explained in Section [1.2. 2 1 the reason why we focus 
on the complex Gaussian ensemble is simply that, as shown by Theorem 1351 t^is Gaussian 

ensemble arises naturally when we consider truncations of Haar-random unitary matrices. 
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Our goal is to give evidence in favor of Conjecture [6l the Permanent Anti- Concentration Con- 
jecture (PACC). This is the conjecture that, if X ~ ^"X" Gaussian, then Per (X) is "not too con- 
centrated around 0": a 1 — 1/ poly (n) fraction of its probability mass is greater than \/nI/ poly (n) 
in absolute value, \/nI being the standard deviation. More formally, there exists a polynomial p 
such that for all n and 5 > 0, 



Pr 



I Per (X) I < 



p{n, 1/6) 



< 6. 



An equivalent formulation is that there exist constants C, D and /? > such that for all n and 
e > 0, 



Pr 



|Per(A:)| < e 



Conjecture El has two applications to strengthening the conclusions of this paper. First, it lets 
us multiplicatively estimate Per (X) (that is, solve the GPEx problem), assuming only that we 
can additively estimate Per (X) (that is, solve the GPE-t problem). Indeed, if Conjecture [6] holds, 
then as pointed out in Lemma HH additive and multiplicative estimation become equivalent for this 
problem. Second, as shown by Theorem [71 Conjecture [6] lets us estimate Per (X) itself, assuming 
we can estimate |Per(X)|^. The bottom line is that, if Conjecture E] holds, then we can base our 
conclusions about the hardness of approximate BosonSampling on the natural conjecture that 
GPEx is #P-hard, rather than the relatively-contrived conjecture that |GPE|_|_ is ^^P-hard. 

At a less formal level, we believe proving Conjecture [6] might also provide intuition essential to 
proving the "bigger" conjecture, that these problems are #P-hard in the first place. 

The closest result to Conjecture [6] that we know of comes from a 2009 paper of Tao and Vu 
|57j . These authors show the following: 



Theorem 49 (Tao-Vu [57J ) For all e > and sufficiently large n, 



Pr 

xe{-i,ir 



I Per (X) I < 



< 



n 



0.1 ' 



Alas, Theorem |39] falls short of what we need in two respects. First, it only upper-bounds 
the probability that |Per(X)| < ^fnXjn^'^^ whereas we need to upper-bound the probability that 
|Per {X)\ < \fn\l poly (n). Second, the upper bound obtained is l/n*^'^ (and Tao and Vu say that 
their technique seems to hit a barrier at 1/ ^/n) , whereas we need an upper bound of 1/ poly (n) . A 
more minor problem is that Theorem 1491 applies to Bernoulli random matrices, not Gaussian ones. 
Of course, the Gaussian case might be easier rather than harder. 

In the rest of the section, we will give three pieces of evidence for Conjecture [6l The first, in 
Section 18. H is that it is supported numerically. The second, in Section 18.21 is that the analogous 
statement holds with the determinant instead of the permanent. The proof of this result makes 
essential use of geometric properties of the determinant, which is why we do not know how to 
extend it to the permanent. On the other hand, Godsil and Gutman [26] observed that, for all 
matrices X = (xjj), 

/ zb^/ill ••• zb^xi; 

Per {X) = E Det 
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where the expectation is over all 2" ways of assigning +'s and — 's to the entries. Because of this 
fact, together with our numerical data, we suspect that the story for the permanent may be similar 
to that for the determinant. The third piece of evidence is that a weaker form of Conjecture [6] 



holds: basically, |Per(X)| has at least a J7(l/n) probability of being il. 



We prove this 



by calculating the fourth moment of Per {X). Unfortunately, extending the calculation to higher 
moments seems difficult. 

Before going further, let us make some elementary remarks about the distribution over Per (X) 
for X ~ gnxn^ gy syninietry, clearly E [Per (X)] = 0. The second moment is also easy to calculate: 



E 



I Per {X)\' 



E 



E 



^ '] J_ J_ •^i,cr{i)-^i,T{i) 



En 



n 



i,cr{i) I 



n\. 



We will often find it convenient to work with the normalized random variable 

|2 



[Per {X )\' 



so that E[P„] = 1. 



8.1 Numerical Data 

Figure H] shows the numerically-computed probability density function of P„ when n = 6. For 
comparison, we have also plotted the pdf of Dn := |Det (^)l'^ /nl. 

The numerical evidence up to n = 10 is strongly consistent with Conjecture [6j Indeed, from 
the data it seems likely that for all < /3 < 2, there exist constants C, D such that for all n and 
e > 0, 



Pr 



I Per (X) I < eVn 
and perhaps the above even holds when (3 = 2. 



< 



8.2 The Analogue for Determinants 

We prove the following theorem, which at least settles Conjecture [6] with the determinant in place 
of the permanent: 

Theorem 50 (Determinant Anti-Concentration Theorem) For all < (5 < 2, there exists 
a constant Cp such that for all n and e > 0, 



Pr 



|Det {X)\ < eVn 



< 
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Figure 4: Probability density functions of the random variables -D„ = |Det /n! and P„ = 
|Per (X)l^ /n!, where X ~ gnxn ^ complex Gaussian random matrix, in the case n = 6. Note 
that E = E [P„] = 1. As n increases, the bends on the left become steeper. We do not know 
whether the pdfs diverge at the origin. 

We leave as an open problem whether Theorem 1501 holds when (3 = 2. 

Compared to the permanent, a lot is known about the determinants of Gaussian matrices. In 
particular, Girko [25] (see also Costello and Vu [161 Appendix A]) have shown that 



In |Det {X)\ - In yj{n-l)\ 

In n 
2 

converges weakly to the normal distribution AA(0, Unfortunately, weak convergence is not 

enough to imply Theorem 1501 so we will have to do some more work. Indeed, we will find that the 
probability density function of |Det (^)|^, in the critical regime where |Det {X)\^ k, 0, is different 
than one might guess from the above formula. 

The key fact about Det (X) that we will use is that we can compute its moments exactly — even 
the fractional and inverse moments. To do so, we use the following beautiful characterization, 
which can be found (for example) in Costello and Vu [161. 

Lemma 51 ([16]) Let X ~ ^"-x" 5g ^ complex Gaussian random matrix. Then |Det(X)|^ has 
the same distribution as 

n 1 i 

n Eifei^ 

i=i \j=i 

where the 's are independent Af (0,1) ^ Gaussians. (In other words, |Det(X)|^ is distributed as 
Ti • • • Tn, where each is an independent random variable with k degrees of freedom.) 

The proof of Lemma [51] (which we omit) uses the interpretation of the determinant as the 
volume of a parallelepiped, together with the spherical symmetry of the Gaussian distribution. 
As with the permanent, it will be convenient to work with the normalized random variable 

|Det(X)|^ 



n< 

so that E [Dn] = 1. Using Lemma ISTl we now calculate the moments of D„ 
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Lemma 52 For all real numbers a > — 1, 



(n!) T{k) 



(If a<-l then E [D'^] = oo.) 
Proof. By Lemma \5T\ 

1 " 

^ ' k=i 

where each is an independent random variable with k degrees of freedom. Now, Tk has 
probabihty density function 



for X > 0. So 



oo 



1 

J' 

r{k + a) 



as long as A; + a > 0. (If /c + a < 0, as can happen if a < —1, then the above integral diverges. 
As a sample application of Lemma [52l if a is a positive integer then we get 



For our application, though, we are interested in the dependence of E on n when a is not nec- 
essarily a positive integer. The next lemma shows that the asymptotic behavior above generalizes 
to negative and fractional a. 

Lemma 53 For all real numbers a > —1, there exists a positive constant Cq, such that 

lim — -, — , , = Cq,. 

Proof. Let us write 

ra + a)"^'r^fc + a + i) 
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Then by Stirling's approximation, 



n—l -n / T . , \ n— 1 



^r(t + a. + i)^^/ ro^ + g + i) 
n t»r(*: + l) r(*: + i) 

n-1 / 



H^ + o(l) + Y In ^ I -alnA; 



k=l 
n—l 



Ha + oil) + Yl(^(y^ + a + ^^ In 



k + a 

a 



i?. + ^ + o(l) + |:((/c + a + l) (f-i^) 

Z 1 \ / 



N a (a + 1) , 
= i/a + Jc, + L„ + o (1) + Inn. 

In the above, Ha, Ja, and Lq, are finite error terms that depend only on a (and not n): 

f., / r(fc + a + l) Vfc(f)^ \ 

Q(a + 1)/. ^1 ^ \ ^a2(2Q + l) 
L„ = lim > - — In n — > 

\ fc=i / fc=i 

_ a (a + 1)7 a2(2a + l)7r2 

~ 2 24fc2 ' 

where 7 ~ 0.577 is the Euler-Mascheroni constant. The o(l)'s represent additional error terms 
that go to as n — >• 00. Hence 



n-l 



and 



= r(l + Q) 6^"+-^"+^", 

which is a positive constant Cq, depending on a. ■ 
We can now complete the proof of Theorem [50l 
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Proof of Theorem 1501 Let a := 



—(3/2. Then by Markov's inequality, for all e > we have 



E 



> Pr 



Det {X)\ I 

Det {X)\ < eVnl 



1 



Hence 



Pr 



|Det(X)| < eV^. < E[D°] • 



for some positive constants Cq, depending only on a and /3 respectively. ■ 
8.3 Weak Version of the PACC 

We prove the following theorem about concentration of Gaussian permanents. 
Theorem 54 (Weak Anti-Concentration of the Permanent) For all a < 1, 

(1 



Pr 

Xr^Q"-'^" 



|Per(X)r >a-n\ 



> 



a) 



n + 1 



While Theorem 1541 falls short of proving Conjecture [6l it at least shows that |Per(X)| has a 
non-negligible probability of being large enough for our application when X is a Gaussian random 
matrix. In other words, it rules out the possibility that |Per {X)\ is almost always tiny compared 
to its expected value, and that only for (say) a 1/ exp (n) fraction of matrices X does |Per (X)| 
become enormous. 

Recall that Pn denotes the random variable |Per (X)|^ /n!, and that E = 1. Our proof of 
Theorem [Ml will proceed by showing that E [P^] = n + 1. As we will see later, it is almost an 
"accident" that this is true — E [P^] , E [P^] , and so on all grow exponentially with n — but it is 
enough to imply Theorem 1541 

To calculate E [P^] , we first need a proposition about the number of cycles in a random permu- 
tation, which can be found in Lange [37, p. 76] for example, though we prove it for completeness. 
Given a permutation a E Sn, let eye (a) be the number of cycles in a. 



Proposition 55 For any constant c > 1, 



E 

cr&Sn 



^cyc (cr) 



n + c — 1 
c-1 



Proof. Assume for simplicity that c is a positive integer. Define a c-colored permutation (on n 
elements) to be a permutation o" G S"™ in which every cycle is colored one of c possible colors. Then 
clearly the number of c-colored permutations equals 

fin) := c^y^'^l 

cr&Sn 
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Now consider forming a c-colored permutation a. There are n possible choices for If 
(t(1) = 1, then we have completed a cycle of length 1, and there are c possible colors for that 
cycle. Therefore the number of c-colored permutations a such that cr (1) = 1 is c • / (ra — 1). On 
the other hand, if cr (1) = 6 for some 6 7^ 1, then we can treat the pair (1, 6) as though it were a 
single clement, with an incoming edge to 1 and an outgoing edge from h. Therefore the number 
of c-colored permutations u such that a (1) = 6 is / (n — 1). Combining, we obtain the recurrence 
relation 

/(n) = c-/(n-l) + (n-l)/(n-l) 
= (n c - 1) / (n - 1) . 

Together with the base case / (0) = 1, this implies that 

/ (n) = (n + c - 1) (n + c - 2) c 

'n-\- c—V" 
c- 1 



• n\. 



Hence 



E 



ncyc{<7) 



/(n) 



n + c — 1 
c- 1 



The above argument can be generalized to non-integer c using standard tricks (though we will not 
need that in the paper). ■ 

We can now compute E [P^] . 

Lemma 56 E [P^] =n + l. 

Proof. We have 



1 



E 



{n\y 
1 



Per {Xy Per (X) 



E 



(n!)^ x^g 
1 



^ ^ J__|_-^i,(7(i)-^i,r(i)'^i,a(i)'''«,/3(i) 
a,T,a,/3eSn i=l 



J2 M{a,T,a,P) 



where 



(^') a,T,a,PeS, 



M(a,T,a,P):= E 



.1=1 



nE [Xi^ai 



i=l 



(i)-^i,r(i)^j,a(i)-^i,/3(i) 



the last line following from the independence of the Gaussian variables Xij. 
We now evaluate M (a, r, a, j3). Write (jUr = aU/3if 

{(1, a (1)) , (1, T (1)) ,...,{n,a (n)) , (n, r (n))} = {(1, a (1)) , (1, /3 (1)) . . . , (n, a (n)) , (n, /3 (n))} . 
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If (T U r 7^ a U /3, then we claim that AI {a, r, a, j3) = 0. This is because the Gaussian distribution 
is uniform over phases — so if there exists an Xij that is not "paired" with its complex conjugate Xij 
(or vice versa), then the variations in that Xij will cause the entire product to equal 0. So suppose 
instead that a U t = aU p. Then for each i G [n] in the product, there are two cases. First, if 
a (i) ^ T {i), then 



E 



= E 
= E 

= 1. 



|'^i,(T(j)| |'^j,r('i)| 



.,a{i) I 



E 



i,T{i) I 



Second, if a {i) = t {i), then 



E 



)] 



E 



2. 



The result is that M (a, r, a, ^) = 2^^'^''^\ where K {a, r) is the number of i's such that a {i) 
Now let N (cr, r) be the number of pairs a,PeSn such that aUr = aU p. Then 

1 



T{i). 



2 ^ M(c7,r,a,/3) 

a,T,a,l3eSn 



2^(^.^)iV((T,r) 



E [2^(^'^)iV((T,r) 

-e5„ L 'i 

2^(^'^)iV(e,d, 



E 

a.reS, 



E 



where e denotes the identity permutation. Here the fourth line follows from symmetry — specifically, 
from the easily-checked identities K (a, t) = K {aa, ar) and TV (cr, t) = N {aa, ar) . 

We claim that the quantity 2^^'^'^'> N {e,^) has a simple combinatorial interpretation as 2'^^'^'^^'', 
where cyc(^) is the number of cycles in ^. To see this, consider a bipartite multigraph G with 
n vertices on each side, and an edge from left-vertex i to right-vertex j if i = j or ^ (i) = j (or 
a double-edge from i to j if z = j and ^ {i) = j). Then since e and ^ are both permutations, 
G is a disjoint union of cycles. By definition, K (e, ^) equals the number of indices i such that 
^ (i) = i — which is simply the number of double-edges in G, or equivalently, the number of cycles 
in ^ of length 1. Also, iV (e, ^) equals the number of ways to partition the edges of G into two 
perfect matchings, corresponding to a and fi respectively. In partitioning G, the only freedom we 
have is that each cycle in G of length at least 4 can be decomposed in two inequvalent ways. This 
implies that (e, ^) = 2^*^^^, where L (^) is the number of cycles in ^ of length at least 2 (note that 
a cycle in ^ of length k gives rise to a cycle in G of length 2k). Combining, 



Hence 



E [Pi] 



E 



n + 1 
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(1-a) 



by Proposition [55l ■ 

Using Lemma [56l we can now complete the proof of Theorem 1541 that Pr [Pn > a] > 
Proof of Theorem 1541 Let F denote the event that P„ > a, and let 6 := Pr [F]. Then 

1 = E [Pn] 
= Pr [F] E [Pn I F] + Pr [F] E [P„ I F] 

<5E[Pn [F] + a, 



2 



SO 



By Cauchy-Schwarz, this implies 

E [Pi I F] > 
and hence 



E[^n I F] > i-^. 



1 — a 



2 



E [Pi] = Pt[F]E[pI I F] +Pr [F] E [pI | F] 



• +0 

_ (l-af 



Now, we know from Lemma [56] that E [Pl] = n + 1. Rearranging, this means that 

{l-af 



6 > 



n + 1 



which is what we wanted to show. ■ 

A natural approach to proving Conjecture [6] would be to calculate the higher moments of 
Pn — E [Pn] ) E [Pn] , and so on — by generalizing Lemma [56l In principle, these moments would 
determine the probability density function of Pn completely. 

When we do so, here is what we find. Given a bipartite /c-regular multigraph G with n vertices 
on each side, let M (G) be the number of ways to decompose G into an ordered list of k disjoint 
perfect matchings. Also, let be the expectation of M {G) over a A:-regular bipartite multigraph 
G chosen uniformly at random. Then the proof of Lemma [56l extends to show the following: 

Theorem 57 E [P^] = for all positive integers k. 

However, while Mi = 1 and M2 = n + 1, it is also known that Mk ~ (k/e)^ for all k > 3: this 
follows from the van der Waerden conjecture, which was proved by Falikman [19] and Egorychev 
|18j in 1981. In other words, the higher moments of Pn grow exponentially with n. Because of 
this, it seems one would need to know the higher moments extremely precisely in order to conclude 
anything about the quantities of interest, such as Pr [Pn < a]. 
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9 The Hardness of Gaussian Permanents 



In this section, we move on to discuss Conjecture El which says that GPEx — the problem of 
multiphcatively estimating Per(X), where X ~ gn^n jg ^ Gaussian random matrix — is #P-hard. 
Proving Conjecture [5] is the central theoretical challenge that we leave 1^ 

Intuitively, Conjecture [5] implies that if P*'^ ^ BPP, then no algorithm for GPE X can run 
in time poly (n, 1/(5). Though it will not be needed for this work, one could also consider a 
stronger conjecture, which would say that if P*^ ^ BPP, then no algorithm for GPEx can run in 
time n^^^'^^ for any function /. 

In contrast to the case of the Permanent Anti-Concentration Conjecture, the question arises of 
why one should even expect Conjecture [5] to be true. Undoubtedly the main reason is that the 
analogous statement for permanents over finite fields is true: this is the random self-reducibility of 
the permanent, first proved by Lipton [39]. Thus, we are "merely" asking for the real or complex 
analogue of something already known in the finite field case. 

A second piece of evidence for Conjecture [5] is that, if X ~ gnxn ^ Gaussian matrix, then 
all known approximation algorithms fail to find any reasonable approximation to Per (AT). If X 
were a nonnegative matrix, then we could use the celebrated approximation algorithm of Jerrum, 
Sinclair, and Vigoda [30j — but since X has negative and complex entries, it is not even clear how to 
estimate Per (X) in BPP'^^, let alone in BPP. Perhaps the most relevant approximation algorithms 
are those of Gurvits [27j, which we discuss in Appendix [T2j In particular. Theorem 1661 will give 
a randomized algorithm due to Gurvits that approximates Per (X) to within an additive error 
ibe IIXII"", in O (n^/e^) time. For a Gaussian matrix X ~ g^x""-^ known that || A|| ~ 2^/n almost 
surely. So in O (n^/e^) time, we can approximate Per (A) to within additive error ±e{2^/n)'^. 
However, this is larger than what we need (namely ibe\/nl/ poly (n)) by a ~ {2^/e)^ factor. 

In the rest of this section, we discuss the prospects for proving Conjecture El First, in Section 
19.11 we at least show that exactly computing Per (X) for a Gaussian random matrix X ~ g'^xn 
#P-hard. The proof is a simple extension of the classic result of Lipton [39j, that the permanent 
over finite fields is "random self-reducible": that is, as hard to compute on average as it is in 
the worst case. As in Lipton's proof, we use the facts that (1) the permanent is a low-degree 
polynomial, and (2) low-degree polynomials constitute excellent error-correcting codes. However, 
in Section 19.21 we then explain why any extension of this result to show average- case hardness of 
approximating Per (X) will require a fundamentally new approach. In other words, the "polynomial 
reconstruction paradigm" cannot suffice, on its own, to prove Conjecture El 

9.1 Evidence That GPEx Is #P-Hard 

We already saw, in Theorem l28l that approximating the permanent (or even the magnitude of the 
permanent) of all matrices X G i[^n-xn ^g ^ :^P-hard problem. But what about the "opposite" 
problem: exactly computing the permanent of most matrices X ~ g^xn? ^.j^-g gg^tion, we 
will show that the latter problem is ^^P-hard as well. This means that, if we want to prove the 
Permanent-of-Gaussians Conjecture, then the difficulty really is just to combine approximation 
with an average-case assumption. 

■^■^Though note that, for our BosonSampling hardness argument to work, all we really need is that estimating 
Per (X) for Gaussian X is not in the class BPP'^'', and one could imagine giving evidence for this that fell short of 
#P-hardness. 
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Our result will be an adaptation of a famous result on the random-self-reducibility of the 
permanent over finite fields: 



Theorem 58 (Random-Self-Reducibility of the Permanent [39] , [23] , [24] , [12] ) For alia > 
l/poly(n) and primes p > (3n/a)^, the following problem is ^P-hard: given a uniform random 
matrix M € F^^", output Per (M) with probability at least a over M. 

The proof of Theorem [58] proceeds by reduction: suppose we had an oracle O such that 

Pr [O (M) = Per (M)] > a. 



Using O, we give a randomized algorithm that computes the permanent of an arbitrary matrix 
X G Fp^". The latter is certainly a #P-hard problem, which implies that computing Per (M) for 
even an a fraction of M's must have been #P-hard as well. 

There are actually four variants of Theorem [58l which handle increasingly small values of a. 
All four are based on the same idea — namely, reconstructing a low-degree polynomial from noisy 
samples — but as a gets smaller, one has to use more and more sophisticated reconstruction methods. 
For convenience, we have summarized the variants in the table below. 

Success probability a Reconstruction method Curve in F"^" Reference 



3n 



1 



4 poly(n) 

1 , 1 

2 poly(n) 

poly(n) 



Lagrange interpolation 
Berlekamp- Welch 
Berlekamp- Welch 
Sudan's list decoding [56 



Linear 
Linear 
Polynomial 
Polynomial 



Lipton [39] 
Gemmell et al. [23 
Gemmell-Sudan 
Cai et al. [12] 



In adapting Theorem [58] to matrices over C, we face a choice of which variant to prove. For 



simplicity, we have chosen to prove only the a = | 
believe that it should be possible to adapt the a = 



+ 



' + 



variant in this paper. However, we 



and a 



1 

poly(n) 



variants to the 



2 ' poly(n) 

complex case as well; we leave this as a problem for future work. 

Let us start by explaining how the reduction works in the finite field case, when a = j + 6 for 
some 6 = -^^^^j^- Assume we are given as input a matrix X G F^^", where p > n/6 is a prime. 
We are also given an oracle O such that 



Pr [O (M) 

Then using O, our goal is to compute Per (X). 

We do so using the following algorithm 
random. Then set 



Per (M)] >^ + S. 



First choose another matrix Y £ ]p^x" 



uniformly at 



X (t) :=X + tY, 
q{t) :=Per {X (t)) . 

Notice that q (t) is a univariate polynomial in t, of degree at most n. Furthermore, q (0) = 
Per(X(0)) = Per (AT), whereas for each t ^ 0, the matrix X (t) is uniformly random. So by 
assumption, for each t 7^ we have 



Ft[0{X {t))=q{t)]>-+6. 
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Let S be the set of all nonzero t such that O {X (t)) = q{t). Then by Markov's inequality, 



Pr 



\S\>[\ + 5]{p-l) 



— 1 X — 9 



2 



So if we can just compute Per {X) in the case where IS*] > (1/2 + 5) (p — 1), then all we need to 
do is run our algorithm O (1/(5^) times (with different choices of the matrix Y), and output the 
majority result. 

So the problem reduces to the following: reconstruct a univariate polynomial g : Fp — )■ Fp of 
degree n, given "sample data" O {X (1)) , . . . , O (X (p - 1)) that satisfies q{t) = O {X {t)) for at 
least a ^ + 5 fraction of t's. Fortunately, we can solve that problem efficiently using the well-known 
Berlekamp- Welch algorithm: 

Theorem 59 (Berlekamp- Welch Algorithm) Let q be a univariate polynomial of degree d, 
over any field F. Suppose we are given m pairs of ¥-elements (xi,yi) , . . . , (x 

■mi Vm) (with the 

Xi's all distinct), and are promised that yi = q{xi) for more than ^I^i^ values of i. Then there is 
a deterministic algorithm to reconstruct q, using poly (n, d) field operations. 

Theorem applies to our scenario provided p is large enough (say, at least n/5). Once we 
have the polynomial q, we then simply evaluate it at to obtain q (0) = Per {X). 

The above argument shows that it is ^P-hard to compute the permanent of a "random" 
matrix — but only over a sufficiently-large finite field F, and with respect to the uniform distri- 
bution over matrices. By contrast, what if F is the field of complex numbers, and the distribution 
over matrices is the Gaussian distribution, ^"■><"-? 

In that case, one can check that the entire argument still goes through, except for the part where 
we asserted that the matrix X (t) was uniformly random. In the Gaussian case, it is easy enough to 
arrange that X (t) ~ ^"X"- for some fixed t ^ 0, but we can no longer ensure that X (t) ~ g^x"^ for 
all t ^ simultaneously. Indeed, X (t) becomes arbitrarily close to the input matrix ^ (0) = X 
as t — >• 0. Fortunately, we can deal with that problem by means of Lemma HSj which implies that, 
if the matrix M G ([^nxn -g g^mpled from ^"X" and if is a small shift, then M + ii^ is nearly 
indistinguishable from a sample from ^"-x". Using Lemma HS} we now adapt Theorem 1581 to the 
complex case. 

Theorem 60 (Random Self-Reducibility of Gaussian Permanent) For all 6 >1/ poly (n), 
the following problem is ^P-hard. Given an n x n matrix M drawn from ^"^"^ output Per (M) 
with probability at least j + 6 over M. 

Proof. Let X = (xij) G {0,1}"^"" be an arbitrary 0/1 matrix. We will show how to compute 
Per (X) in probabilistic polynomial time, given access to an oracle O such that 

Pr [O (M) = Per (M)l > ? + 5. 

Clearly this suffices to prove the theorem. 

The first step is to choose a matrix Y € C"^" from the Gaussian distribution ^"X"-. Then 
define 

X{t) := {l-t)Y + tX, 
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so that X{0) = Y and X (1) = X. Next define 

qit) :=Per(X (t)) , 

so that q (t) is a univariate polynomial in t of degree at most n, and g (1) = Per {X (1)) = Per (X). 
Now let L := [n/(5] and e := (4„ii^2n)L • -^^^ each £ G [L], call the oracle O on input matrix 
Then, using the Berlekamp- Welch algorithm (Theorem I59p . attempt to find a degree-n 
polynomial : C — t- C such that 

q' (el) = 0{X (ee)) 

for at least a | + 5 fraction of ^ G [L]. If no such q' is found, then fail; otherwise, output q' (1) as 
the guessed value of Per (X) . 

We claim that the above algorithm succeeds (that is, outputs q' (1) = Per (X)) with probability 
at least ^ + | over Y. Provided that holds, it is clear that the success probability can be boosted 
to (say) 2/3, by simply repeating the algorithm O (l/^^) times with different choices of Y and then 
outputting the majority result. 

To prove the claim, note that for each i G [L], one can think of the matrix X (ei) as having 
been drawn from the distribution 

n 

Let 

n 

Then by the triangle inequality together with Lemma l48| 

< 2n^ee + ^n'^ {elf 

< {2n^ + n) eL 
6 

< -. 
- 2 



Hence 



Pr [O (X (ei)) = q (ei)] > ^ + 5 - \\Ve - M (0, 

3 6 

Now let S be the set of all i G [L] such that O (X (ei)) = q (ei). Then by Markov's inequality, 

Pr 



|S|>(5 + 5|i 



1 1 X 

T — 9 i 

> 1 - 4 — T > - + -• 

2 2 



Furthermore, suppose IS"! > + |) L. Then by Theorem 159^ the Berlekamp- Welch algorithm will 
succeed; that is, its output polynomial q' will be equal to q. This proves the claim and hence the 
lemma. ■ 
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As mentioned before, we conjecture that it is possible to improve Theorem 1601 to show that it 
is #P-hard even to compute the permanent of an a = p^iy^n) faction of matrices X drawn from 
the Gaussian distribution 

Let us mention two other interesting improvements that one can make to Theorem [60l First, 
one can easily modify the proof to show that not just Per (X), but also |Per is as hard to 

compute for X drawn from the Gaussian distribution as it is in the worst case. For this, 

one simply needs to observe that, just as Per (X) is a degree-n polynomial in the entries of X, so 
I Per (X)l^ is a degree-2n polynomial in the entries of X together with their complex conjugates (or 
alternatively, in the real and imaginary parts of the entries) . The rest of the proof goes through as 
before. Since |Per is #P-hard to compute in the worst case by Theorem 1281 it follows that 

|Per (X)l^ is T^P-hard to compute for X drawn from the Gaussian distribution as well. 

Second, in the proof of Theorem 1601 one can relax the requirement that the oracle O computes 
Per (X) exactly with high probability over X ~ gn>in^ ^^^^ merely require that 



Pr 



|C'(X) -Per(X)| < 2 



-q{n) 



3 1 

> T + 



4 poly (n) ' 

for some sufficiently large polynomial q. To do so, one can appeal to the following lemma of Paturi. 

Lemma 61 (Paturi |43j; see also Buhrman et al. Let p : M — )• M 6e a real polynomial 

of degree d, and suppose \p {x)\<5 for all \x\ < e. Then \p{l)\ < 5e2'^(i+V^). 

From this perspective, the whole challenge in proving the Permanent-of-Gaussians Conjecture 
is to replace the 2"'^^""^ approximation error with l/g(n). 

Combining, we obtain the following theorem, whose detailed proof we omit. 

Theorem 62 There exists a polynomial p for which the following problem is ^P-hard, for all 
6 > 1/ poly (n). Given an n x n matrix X drawn from ^"-^"^ output a real number y such that 
y - |Per {X)\^ < 2-P('^'V5) probability at least 1 + 6 over X. 

As a final observation, it is easy to find some efficiently samplable distribution T> over matrices 
X G C"^", such that estimating Per (X) or |Per (X)p for most X ~ 2? is a #P-hard problem. To do 
so, simply start with any problem that is known to be ^^P-hard on average: for example, computing 
Per (M) for most matrices M S F^^" over a finite field Fp. Next, use Theorem [28] to reduce the 
computation of Per (M) (for a uniform random M) to the estimation of |Per (Xi)|^ , • • • , |Per {Xm)\^ , 
for various matrices Xi, . . . ,Xm & C"^". Finally, output a random Xi as one's sample from T>. 
Clearly, if one could estimate |Per(X)|^ for a 1 — 1/ poly (n) fraction of X ~ P, one could also 
compute Per (M) for a 1 — 1/ poly (n) fraction of M G F^^", and thereby solve a ^^P-hard problem. 
Because of this, we see that the challenge is "merely" how to prove average-case #P-hardness, in the 
specific case where the distribution T) over matrices that interests us is the Gaussian distribution 
gnxn ^^j. t^q^q generally, some other "nice" or "uniform-looking" distribution). 

9.2 The Barrier to Proving the PGC 

In this section, we identify a significant barrier to proving Conjecture O and explain why a new 
approach seems needed. 
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As Section [9TT] discussed, all existing proofs of the worst-case/average-case equivalence of the 
Permanent are based on low-degree polynomial interpolation. More concretely, given a matrix 
X G JF^-x" for which we want to compute Per (X), we first choose a random low-degree curve X (t) 
through F"^" satisfying X (0) = X. We then choose nonzero points ti, . . . ,tm G M, and compute 
or approximate Per {X (tj)) for all i € [m], using the assumption that the Permanent is easy on 
average. Finally, using the fact that q (t) := Per {X (t)) is a low-degree polynomial in t, we perform 
polynomial interpolation on the noisy estimates 

yi ^q{ti),-- ■,ym ^q{tm) , 

in order to obtain an estimate of the worst-case permanent q (0) = Per (X (0)) = Per (X). 

The above approach is a very general one, with different instantiations depending on the base 
field F, the fraction of X's for which we can compute Per {X), and so forth. Nevertheless, we claim 
that, assuming the Permanent Anti-Concentration Conjecture, the usual polynomial interpolation 
approach cannot possibly work to prove Conjecture O Let us see why this is the case. 

Let X G c^x" a matrix where every entry has absolute value at most 1. Then certainly it is a 
#P-hard problem to approximate Per (X) multiplicatively (as shown by Theorem I28| for example). 
Our goal is to reduce the approximation of Per (X) to the approximation of Per (Xi) , . . . , Per (Xm), 
for some matrices Xi, . . . ,Xm, that are drawn from the Gaussian distribution ^"^n ^j. something 
close to it. 

Recall from Section [8] that 

E 



which combined with Markov's inequality yields 



I Per 



n!, 



Pr 



|Per(X)| > kVnl < ^ (7) 



for all k > 1. But this already points to a problem: |Per(X)| could, in general, be larger than 
|Per (-'^i)! , . . . , I Per by an exponential factor. Specifically, |Per {X)\ could be as large as n! 

(for example, if A is the all-l's matrix). By contrast, |Per {Xi)\ , . . . , |Per {Xm)\ will typically be 
0{^/n\) by equation ([7]). And yet, from constant-factor approximations to Per [Xi] , . . . , Per (Xm), 
we are supposed to recover a constant-factor approximation to Per(X), even in the case that 
|Per(X)| is much smaller than n\ (say, |Per(X)| ~ ^/n}.). 

Why is this a problem? Because polynomial interpolation is linear with respect to additive 
errors. And therefore, even modest errors in estimating Per {Xi) , . . . , Per {Xm) could cause a large 
error in estimating Per {X). 

To see this concretely, let X be the n x n all-l's matrix, and X (t) be a randomly-chosen curve 
through C"^" that satisfies X (0) = X. Also, let ti,. . . ,tm G M be nonzero points such that, as 
we vary X, each X (ti) is close to a Gaussian random matrix X ~ ^"^n ^-^yg need not assume 
that the X (tj)'s are independent.) Finally, let go (t) ■= Per {X (t)). Then 

(i) ko (^i)l T ■ ■ Aqo (^m)l are each at most n^^^^Vn^. with high probability over the choice of X, 
but 

(ii) \qo{0)\ = |Per(X(0))| = |Per(X)| = n\. 
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Here (i) holds by our assumption that each X [ti] is close to Gaussian, together with equation 

m- 

All we need to retain from this is that a polynomial go with properties (i) and (ii) exists, within 
whatever class of polynomials is relevant for our interpolation problem. 

Now, suppose that instead of choosing X to be the all-l's matrix, we had chosen an X such that 
|Per (^)l < \/nI- Then as before, we could choose a random curve X (t) such that X (0) = X and 
X (ti) , . . . ,X (tm) are approximately Gaussian, for some fixed interpolation points ti, . . . ,tm € 
Then letting q{t) := Per {X (t)), we would have 

(i) \q {ti)\ , . . . ,\q (im)| are each at least -v/nl/n'^^^) with high probability over the choice of X, 
and 



(ii) \q (0)1 = |Per {X (0))| = |Per {X)\ < Vn\. 

Here (i) holds by our assumption that each X (ti) is close to Gaussian, together with Conjecture 
E] (the Permanent Anti-Concentration Conjecture). 
Now define a new polynomial 

q{t) := q (t) + 7^0 (t) , 
where, say, I7I = 2^". Then for all i G [m], the difference 

nO(i) 

\q (ti) - q {ti)\ = 1790 (*i)| < -^Vnl, 

is negligible compared to \/rjI- This means that it is impossible to distinguish the two polynomials 
q and q, given their approximate values at the points ti, . . . , tm.. And yet the two polynomials have 
completely different behavior at the point 0: by assumption |g(0)| < \/nI, but 

|g(0)| > 1790 (0)1 -k (0)1 

n! r-, 
> — - Vn!. 
- 2" 

We conclude that it is impossible, given only the approximate values of the polynomial q (t) := 
Per (X (t)) at the points ti, ... ,tm, to deduce its approximate value at 0. And therefore, assuming 
the PACC, the usual polynomial interpolation approach cannot suffice for proving Conjecture O 

Nevertheless, we speculate that there is a worst-case/average-case reduction for approximating 
the permanents of Gaussian random matrices, and that the barrier we have identified merely 
represents a limitation of current techniques. So for example, perhaps one can do interpolation 
using a restricted class of low-degree polynomials, such as polynomials with an upper bound on 
their coefficients. To evade the barrier, what seems to be crucial is that the restricted class of 
polynomials one uses not be closed under addition. 

Of course, the above argument relied on the Permanent Anti-Concentration Conjecture, so one 
conceivable way around the barrier would be if the PACC were false. However, in that case, the 
results of Section [7] would fail: that is, we would not know how to use the hardness of GPEx to 
deduce the hardness of |GPE|_|_ that we need for our application. 
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10 Open Problems 



The most exciting challenge we leave is to do the experiments discussed in Section [6l whether in 
linear optics or in other physical systems that contain excitations that behave as identical bosons. 
If successful, such experiments have the potential to provide the strongest evidence to date for 
violation of the Extended Church- Turing Thesis in nature. 
We now list a few theoretical open problems. 

(1) The most obvious problem is to prove Conjecture[5](the Permanent-of-Gaussians Conjecture): 
that approximating the permanent of a matrix of i.i.d. Gaussian entries is ^^P-hard. Failing 
that, can we prove ^^P-hardness for any problem with a similar "flavor" (roughly speaking, 
an average-case approximate counting problem over M or C)? Can we at least find evidence 
that such a problem is not in BPP'^^? 

(2) Another obvious problem is to prove Conjecture [6] (the Permanent Anti-Concentration Con- 
jecture), that |Per(X)| almost always exceeds poly (n) for Gaussian random matrices 
X ~ AA(0,1)J^^". Failing that, any progress on understanding the distribution of Per (X) 
for Gaussian X would be interesting. 

(3) Can we reduce the number of modes needed for our linear-optics experiment, perhaps from 
O (n2) to O (n)? 

(4) How does the noninteracting-boson model relate to other models of computation that are 
believed to be intermediate between BPP and BQP? To give one concrete question, can every 
boson computation be simulated by a qubit-based quantum circuit of logarithmic depth? 

(5) Using quantum fault-tolerance techniques, can one decrease the effective error in our exper- 
iment to l/exp(n) — thereby obviating the need for the mathematical work we do in this 
paper to handle 1/ poly (n) error in variation distance? Note that, if one had the resources 
for universal quantum computation, then one could easily combine our experiment with stan- 
dard fault-tolerance schemes, which are known to push the effective error down to 1/exp (n) 
using poly (n) computational overhead. So the interesting question is whether one can make 
our experiment fault-tolerant using fewer resources than are needed for universal quantum 
computing — and in particular, whether one can do so using linear optics alone. 

(6) Can we give evidence against not merely an FPTAS (Fully Polynomial Time Approximation 
Scheme) for the BosonSampling problem, but an approximate sampling algorithm that 
works for some fixed error e > 1/ poly (n)? 

(7) For what other interesting quantum systems, besides linear optics, do analogues of our hard- 
ness results hold? As mentioned in Section 11.41 the beautiful work of Bremner, Jozsa, and 
Shepherd [10] shows that exact simulation of "commuting quantum computations" in clas- 
sical polynomial time would collapse the polynomial hierarchy. What can we say about 
approximate classical simulation of their model? 

(8) In this work, we showed that unlikely complexity consequences would follow if classical com- 
puters could simulate quantum computers on all sampling or search problems: that is, that 
SampP = SampBQP or FBPP = FBQP. An obvious question that remains is, what about 
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decision problems? Can we derive some unlikely collapse of classical complexity classes from 
the assumption that P = BQP or PromiseP = PromiseBQP? 

(9) Is there any plausible candidate for a decision problem that is efficiently solvable by a boson 
computer, but not by a classical computer? 

(10) As discussed in Sectional it is not obvious how to convince a skeptic that a quantum computer 
is really solving the BosonSampling problem in a scalable way. This is because, unlike 
with (say) Factoring, neither BosonSampling nor any related problem seems to be in NP. 
How much can we do to remedy this? For example, can a prover with a BosonSampling 
oracle prove any nontrivial statements to a BPP verifier via an interactive protocol? 

(11) Is there a polynomial-time classical algorithm to sample from a probability distribution T>' 
that cannot be efficiently distinguished from the distribution D sampled by a boson computer? 
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12 Appendix: Positive Results for Simulation of Linear Optics 

In this appendix, we present two results of Gurvits, both of which give surprising classical polynomial- 
time algorithms for computing certain properties of linear-optical networks. The first result, 
which appeared in [27], gives an efficient randomized algorithm to approximate the permanent of a 
(sub)unitary matrix with ibl/poly (n) additive error, and as a consequence, to estimate final ampli- 
tudes such as (ln| ip (U) \ln) = Per {Un,n) with ±1/ poly (n) additive error, given any linear-optical 
network U. This ability is of limited use in practice, since (1„| f{U) |1„) will be exponentially 
small for most choices of U (in which case, is also a good additive estimate!). On the other 
hand, we certainly do not know how to do anything similar for general, qubit-based quantum 
circuits — indeed, if we could, then BQP would equal BPP. 

Gurvits's second result (unpublished) gives a way to compute the marginal distribution over 
photon numbers for any k modes, deterministically and in n'^^^^ time. Again, this is perfectly 
consistent with our hardness conjectures, since if one wanted to sample from the distribution over 
photon numbers (or compute a final probability such as |(1„| ip (U) |1„)|^), one would need to take 
k > n. 

To prove Gurvits's first result, our starting point will be the following identity of Ryser, which 
is also used for computing the permanent of an n x n matrix in O (2"n^) time. 

Lemma 63 (Ryser's Formula) For all V G C"^", 



Per {V) = E 

xi,...,x„e{-i,i} 



Xl--- XnY[ (VilXi H h VinXr. 



i=l 



Proof. Let p {xi, . . . , x„) be the degree-n polynomial that corresponds to the product in the above 
expectation. Then the only monomial of p that can contribute to the expectation is xi ■ ■ ■ Xn, since 
all the other monomials will be cancelled out by the multiplier of xi • • • (which is equally likely 
to be 1 or —1). Furthermore, as in Lemma EH the coefficient of xi ■ ■ ■ Xn is just 

n 

E n^*--«=Per(F). 
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Therefore the expectation equals 

Per(y) E [xl---xl]=Pev{V). 

xi,...,x„e{-i,ii 

(Indeed, all we needed about the random variables that they were independent and 

had mean and variance 1.) ■ 

Given x = (xi, . . . , x„) G { — 1, l}", let 



Rys^ (V) := xi • • • JJ {vnxi H h 



i=l 



Then Lemma [63] savs that Rys^. (V) is an unbiased estimator for the permanent, in the sense that 
Ex [Rys^, (y)] = Per (V). Gurvits [27] observed the following key further fact about Rys^ {V). 

Lemma 64 |Rys^ {V)\ < 11^11" for all x G {-1, 1}" and all V. 

Proof. Given a vector x = (xi, . . . , Xn) all of whose entries are 1 or —1, let y = Vx, and let 

yi:=ViiXi-\ h VinXn 

be the i^^ component of y. Then ||x|| = ^/n, so ||y|| < ||y|| ||x|| = ||y|| ^/n. Hence 

l^ySx (^)l = \xi---Xnyi---yn\ 

= \yi---yn\ 

\yi\ H h |2/r 



n 

1 1 \ ri. 

y\ 



< 
< 

In 

< ll^ll", 

where the third line follows from the arithmetic-geometric mean inequality, and the fourth line 
follows from Cauchy-Schwarz. ■ 

An immediate consequence of Lemma [64] is the following: 

Corollary 65 |Per (y)| < for all V. 

Another consequence is a fast additive approximation algorithm for Per iV), which works when- 
ever ||y|| is small. 

Theorem 66 (Gurvits's Permanent Approximation Algorithm [27| ) There exists a random- 
ized (classical) algorithm that takes a matrix V € C"^" as input, runs in O (n^/e^) time, and with 
high probability, approximates Per (y) to within an additive error zize\\V\\^. 



Proof. By Lemma 



Per(y)= E [RysAV)]. 

a;e{-l,l}" 
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Furthermore, we know from Lemma [Ml that |Rys^. {V)\ < for every x. So our approximation 

algorithm is simply the following: for T = O (l/e^), first choose T vectors x (1) , . . . , x (T) uniformly 
at random from { — 1, 1}". Then output the empirical mean 

1 ^ 

t=i 

as our estimate of Per (V). Since Rys^ {V) can be computed in O (n^) time, this algorithm takes 
O (n^/e^) time. The failure probability, 

Fv [\p-FeriV)\>e\\Vr], 

x{l),...,x{T) 

can be upper-bounded using a standard Chernoff bound. ■ 

In particular, Theorem 1661 implies that, given an n x n unitary matrix U, one can approximate 
Per (U) to within an additive error ite (with high probability) in poly (n, 1/e) time. 

We now sketch a proof of Gurvits's second result, giving an n'^^'^^-time algorithm to compute 
the marginal distribution over any k photon modes. We will assume the following lemma, whose 
proof will appear in a forthcoming paper of Gurvits. 

Lemma 67 (Gurvits) Let V £ C"X" a matrix of rank k. Then Per {V + /) can be computed 
exactly in nP^^'f time. 

We now show how to apply Lemma [671 to the setting of linear optics. 

Theorem 68 (Gurvits's A;-Photon Marginal Algorithm) There exists a deterministic clas- 
sical algorithm that, given a unitary matrix U £ ([^rnxm^ indices G [m], and occupation 
numbers ji, . . . , jk £ {0, . . . , n}, computes the joint probability 



Pr [si, = ji A • • • A Si, = jk] 

S = (Si,...,Sm)'^l^U 



in n^'^^^ time. 



Proof. By symmetry, we can assume without loss of generality that (ii, . . . ,ik) = {1, . . . ,k). Let 
c = (ci, . . . ,Cfc) be an arbitrary vector in C'^. Then the crucial claim is that we can compute the 
expectation 



E 



|ci| ■■■\ck\ 



^ Pr [si, . . . Icil^''^ • • • Icfcl^** 



\k 



in n'~'^^^ time. Given this claim, the theorem follows easily. We simply need to choose (n + 1) 



values for I ci I , . . . , I Cfc I , compute 'Es~Vr 



I |2si I |2st 

|ci| •••|cfc| 



for each one, and then solve the resulting 



system of (n + l)'^ independent linear equations in (n + l)'^ unknowns to obtain the probabilities 
Pr [si, . . . , Sk] themselves. 

We now prove the claim. Let Ic ■ C™" — )• be the diagonal linear transformation that 
maps the vector (xi, . . . , Xm) to (cixi, . . . , c^x^, x^+i, . . . , Xm), and let /|^|2 = lllc be the linear 

transformation that maps (xi, . . . , Xm) to ^|cip xi, . . . , |cfc|^ x^, x^+i, . . . , x^^ ■ Also, let 

U [Jm,n] (x) = ^ asX^. 



m.n 
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Now define a polynomial q by 
and note that 



g(x) := IcU [Jm,n] {x) , 
(2;) = asx^cl^ ■ ■ ■ c 



Hence 



E 



5=(si,...,Sm,)~X'j/ L 



|ci| ■■■\Ck\ 



i2si 



[\as\^ Si\- ■ ■ Sra^.J ■ ■ ■ \Ck\ 

S={si,...,Sm)&^m,n 

(a5cr---4^)(asc^---4^).i!---.J 

S=(si,...,Sm)e<I>m,n 

= {q,q) ■ 

Now, 

(9, Q) = {IcU [Jm,n] , IcU [Jm,n]) 
U [Jm,n] ; I\c\'^U [Jm,n 
^Jin,m U^ I^i^^iLf [Jiri,n 

= Per((f/t/„.f/)J 

where the second and third lines follow from Theorem 1 17|, and the fourth line follows from Lemma 
[2TI Finally, let A := /|^|2 — /. Then A is a diagonal matrix of rank at most k, and 



u^iid^u 



(A + /) u) 

\ / n,n 

^ / n,n 

= V + I, 

where V := (U^AU)^ ^ is an n x n matrix of rank at most k. So by Lemma [671 we can compute 



Per {V + I)= E 

5=(si,...,Sm)~©(7 



I |2si I |2sfc 



in n*^^*'') time. Furthermore, notice that we can compute V itself in O [n^k^ = rfi^^^ time, 
independent of m. Therefore the total time needed to compute the expectation is rfi'^^^'^^^^^ = 
jiO{k) ^ This proves the claim. ■ 



13 Appendix: The Bosonic Birthday Paradox 

By the birthday paradox, we mean the statement that, if n balls are thrown uniformly and inde- 
pendently into m bins, then with high probability we will see a collision (i.e., two or more balls in 
the same bin) if m = O (n^) , but not otherwise. 
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In this appendix, we prove the useful fact that the birthday paradox still holds if the balls are 
identical bosons, and "throwing" the balls means applying a Haar-random unitary matrix. More 
precisely, suppose there are m modes, of which the first n initially contain n identical photons (with 
one photon in each mode) and the remaining m — n are unoccupied. Suppose we mix the modes by 
applying an m x m unitary matrix U chosen uniformly at random from the Haar measure. Then 
if we measure the occupation number of each mode, we will observe a collision (i.e., two or more 
photons in the same mode) with probability bounded away from if m = O (n^) but not otherwise. 

It is well-known that identical bosons are "gregarious," in the sense of being more likely than 
classical particles to occur in the same state. For example, if we throw two balls uniformly and 
independently into two bins, then the probability of both balls landing in the same bin is only 1/2 
with classical balls, but 2/3 if the balls are identical bosonsEl So the interesting part of the bosonic 
birthday paradox is the "converse direction": when n^, the probability of two or more bosons 

landing in the same mode is not too large. In other words, while bosons are "somewhat" more 
gregarious than classical particles, they are not so gregarious as to require a different asymptotic 
relation between m and n. 

The proof of our main result. Theorem [3l implicitly used this fact: we needed that when 
m ^ n?, the basis states with two or more photons in the same mode can safely be neglected. 
However, while in principle one could extract a proof of the bosonic birthday paradox from the 
proof of Theorem [3l we thought it would be illuminating to prove the bosonic birthday paradox 
directly. 

The core of the proof is the following simple lemma about the transition probabilities induced 
by unitary matrices. 

Lemma 69 (Unitary Pigeonhole Principle) Partition a finite set [M] into a "good part" G 
and "bad part" B = [M] \ G. Also, let U = {uxy) he any M x M unitary matrix. Suppose we 
choose an element x £ G uniformly at random, apply U to \x), then measure U \ x) in the standard 
basis. Then letting y be the measurement outcome, we have Pr [y £ B] < \B\ / \G\. 

Proof. Let i? be an M x M doubly-stochastic matrix whose {x,y) entry is rxy := Then 
applying [/ to a computational basis state \x) and measuring immediately afterward is the same as 
applying R; in particular, Pr [y £ B] = rxy. Moreover, 

x,yeG xeG,ye[M] x£[M],y£G x,ye[M] x,yeB 

= \G\ + \G\-M+ J2 ^-y 

x,y&B 

> 2|G| - M, 

where the first line follows from simple rearrangements and the second line follows from the double- 
stochasticity of R. Hence 



Pr[yGG]= E 



y&G 



2\G\-M ^ \B\ 
- \G\ \G[ 



^^This is in stark contrast to the situation with identical fermions, no two of which ever occur in the same state 
by the PauU exclusion principle. 
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and 

Pr[yGB] = 1-Pr[y gG] < 

■ 

Lemma [69] has the fohowing important corollary. Suppose we draw the M x M unitary matrix 
U from a probability distribution Z, where Z is symmetric with respect to some transitive group 
of permutations on the good set G. Then Pr [y G B] is clearly independent of the choice of initial 
state X €z G. And therefore, in the statement of the lemma, we might as well fix x ^ G rather 
than choosing it randomly. The statement then becomes: 

Corollary 70 Partition a finite set [M] into a "good part" G and "had part" B = [M] \ G. Also, 
let r < Sm be a permutation group that is transitive with respect to G, and let Z he a prohahility 
distribution over M x M unitary matrices that is symmetric with respect to T. Fix an element 
X £ G. Suppose we draw a unitary matrix U from Z, apply U to \x), and measure U \x) in the 
standard basis. Then the measurement outcome will belong to B with probability at most \B\ / \G\. 

Given positive integers m > n, recall that ^m,n is the set of lists of nonnegative integers 
S = (si, . . . , Sm) such that si + • • • + Sm = n. Also, recall from Theorem [3] that a basis state 
S G $m,n is called collision-free if each Sj is either or 1. Let Gm,n be the set of collision- free S"s, 
and let -Bm,n = ^m,n \ Gm,n- Then we have the following simple estimate. 



Proposition 71 
Proof. 

ICmjnl _ in) 



\Gm,n\ ^ ^ ^ 



\^m..r,.\ m 



\(f) I fm+n-l\ 

ml [m — 1)! 



(m — n)\ (m + n — 1)! 



n — 1\ f n — 1\ ( n — 1 

1 1 



n 



m J \ m + 1 J V m + n — 1 

2 



> 1 

m 

■ 

Now let U he an m X m unitary matrix, and recall from Section [3.11 that (p {U) is the "lifting" 
of U to the n-photon Hilbert space of dimension M = (™''^"~"^) • Also, let A = A (U, n) be the 
m X n matrix corresponding to the first n columns of U . Then recall that Pa is the probability 
distribution over „ obtained by drawing each basis state 5 G <I>m n with probability equal to 

\{lnW{U)\S)\\ 

Using the previous results, we can upper-bound the probability that a Haar-random unitary 
maps the basis state to a basis state containing two or more photons in the same mode. 

Theorem 72 (Boson Birthday Bound) Recalling that 'Hm,m is the Haar measure over m x m 
unitary matrices, 

1 2r?2 

Pr [SeB,r,,n] 



E 

m.m 



< 

m 
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Proof. Given a permutation a S Sm of single-photon states (or equivalently of modes), let 
ip (cr) be the permutation on the set $m,n of n-photon states that is induced by a, and let 
r := {if (a) : a G Sm}- Then F is a subgroup of Sm of order ml (where as before, M = C""^^"^))- 
Furthermore, F is transitive with respect to the set Gm,n, since we can map any collision- free basis 
state S G Gm,n to any other collision-free basis state S' G Gm,n via a suitable permutation a G Sm 
of the underlying modes. 

Now let U be the probability distribution over M x M unitary matrices V that is obtained 
by first drawing an m x m unitary matrix U from 'Hm,m and then setting V := ^{U). Then 
since T-Lm,m is symmetric with respect to permutations a G Sm, it follows that l/( is symmetric with 
respect to permutations ip {a) G Sm- 



We want to upper-bound E[/g-^^ 



A(U,n) 



[SeB„ 



after choosing an m x m unitary U from Tim m , applying t 



This is simply the probability that, 
le M X M unitary ip (U) to the basis 



state I In), and then measuring in the Fock basis, we obtain an outcome in Gn 



So 



E 



Pr [S G B„ 



A(U,n) 



< 



\B. 



m,,n I 



\Gn 



< 



n? /m 
1 — rfi jra 



Here the first inequality follows from Corollary 1701 together with the fact that !„ G Gm,n, while the 
second inequality follows from Proposition 1711 Since the expectation is in any case at most 1, we 
therefore have an upper bound of 



mm 



11? /m 
1 — n^/m' 



1 < 



2n2 



m 
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